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Preface 


M. Dolores Jiménez-Lépez 


Research Group on Mathematical Linguistics 
Universitat Rovira i Virgili 

Tarragona, Spain 

E-mail: mariadolores. jimenez@urv.cat 


This volume contains a collection of articles that present different mathemat- 
ical approaches to languages. Languages can be natural or formal /artificial. 
In both cases, we can define a language as a set of sentences, where a sentence 
is a finite string of symbols over an alphabet. Therefore, languages, natural or 
artificial, are particular cases of symbol systems. The manipulation of symbols 
is the stem of formal language theory. The theory of formal languages mainly 
originated from mathematics and linguistics. From the area of mathematics, 
A. Thue and E. Post introduced the formal notion of a rewriting system, while 
A. Turing introduced the general idea of finding models of computing where 
the power of a model could be described by the complexity of the language 
it generates /accepts. From linguistics, the study of grammars and the gram- 
matical structure of a language were initiated by N. Chomsky in the 1950s. 
Chomsky introduced his grammar hierarchy as a tool for modeling and in- 
vestigating syntax of natural languages. After 1964, formal language theory 
developed as a separate branch with specific problems, techniques and results 
and since then it has had an important role in the field of computer science; 
in fact it is considered as the stem of theoretical computer science. 

Taking into account the achievements of the theory of formal languages, 
its initial linguistic motivation and its important role in computation, it is not 
possible to do without this research area in the study of computing languages, 


XII M.D Jiménez-Lépez 


either formal or natural. Being part of what is called theoretical computer 
science, nobody doubts about the adequacy of formal language theory in the 
field of programming languages. What is maybe not so obvious is that all 
those devices can be very useful in the description, analysis and processing of 
natural languages. 

We live within an information society in which there is a need of having 
a linguistic technology to deal with information. Language technologies can 
enrich our computation environment with many of the natural functions char- 
acteristic of human language. Language technologies demand formal models 
that are able to capture the general structure of language and that, at the same 
time, are efficient from the computational point of view. Formal language the- 
ory can offer language technologies the mathematical tools they need for the 
definition of formal mechanisms for dealing with natural languages. Therefore, 
the application of formal languages to the description of natural language can 
throw up interesting results in the field of natural language research, both 
by reformulating the way of describing and explaining natural language and 
by providing formal models for linguistic manipulation that can be useful for 
machine translation, natural language generation, human-computer interfaces 
and all the areas in the field of artificial intelligence that imply the processing 
of natural language. 

The articles included in this volume are extended and improved versions of 
works that have been previously presented and discussed at the weekly semi- 
nars organized by the Research Group on Mathematical Linguistics (GRLMC) 
in the Universitat Rovira i Virgili (Tarragona, Spain). 

I am deeply grateful to all the contributors, for their professional and 
friendly cooperation, and to Universitat Rovira i Virgili for supporting the 
publication of this volume. 


; TRIANGLE 8 e June 2012 


Learning SEC, Languages from Only Positive 
Data* 


Leonor Becerra-Bonache 


Research Group on Mathematical Linguistics 
Universitat Rovira i Virgili 

Tarragona, Spain 

E-mail: leonor.becerra@estudiants.urv.cat 


Summary. The field of Grammatical Inference provides a good theoretical frame- 
work for investigating a learning process. Formal results in this field can be relevant 
to the question of first language acquisition. However, Grammatical Inference stud- 
ies have been focused mainly on mathematical aspects, and have not exploited the 
linguistic relevance of their results. With this paper, we try to enrich Grammatical 
Inference studies with ideas from Linguistics. We propose a non-classical mecha- 
nism that has relevant linguistic and computational properties, and we study its 
learnability from positive data. 


1 Introduction 


Grammatical Inference (GIT) is a subfield of Machine Learning that deals with 
the learning of formal languages. Roughly speaking, a GI problem can be de- 
fined as a gamed played between two players: a teacher and a learner. The 
teacher provides data to the learner, and from this data, the learner must. 
identify the underlying language [4]. The initial theoretical foundations of GI 


* This paper is based on [2] and [1]. 
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were given by M.E. Gold [7], who was primarily motivated by the problem of 
first language acquisition. Since his seminal work, research in GI has focused 
on obtaining formal results (e.g, to find efficient methods for inferring gram- 
mars). Besides this theoretical bent, GI algorithms have also been applied to 
practical problems (e.g., Natural Language Processing, Computational Biol- 
ogy, etc.). Excellent surveys on the field of GI can be found in [6, 17]. 
Chomsky-inspired linguistic studies conceive grammar as a machine (in the 
sense of the theory of formal languages) that children develop and reconstruct 
very fast during the first years of their life. Children infer and select the gram- 
mar of their language from the data that the surrounding world offers them. 
Therefore, the proximity between GI and linguistic studies is considerable. 
On the basis of these ideas, we try to bring together the theory of GI 
and studies of language acquisition, in pursuit of a final goal: to gain insight 
into the process of language acquisition. One concrete goal of this paper is 
to try to improve GI studies by using ideas from Linguistics. After present- 
ing formal preliminaries (Section 2), we review the classes of languages on 
which GI studies have focused and we discuss whether they are suitable for 
modelling natural language syntax (Section 3). Then, we propose to study 
a non-classical mechanism that has important linguistic and computational 
properties (Section 4), and we study its learnability from positive data (Sec- 
tion 5). Concluding remarks and future work are presented in Section 6. 


2 Preliminaries 


In this paper we follow standard definitions and notations in formal language 
theory. Supplementary information can be found in [8]. 

Given an alphabet »’, the set of all strings over the alphabet »' is denoted 
by &*. The set of nonempty strings from alphabet X’ is denoted Xt. A lan- 
guage L over » is a subset of *. The elements of L are called strings or 
words. \ is the empty string. Assume that a € ©’ and w € &”*; the length of 
w is denoted by |w|, and the number of occurrences of a in w is denoted by 
|wla- 

N denotes the set of natural numbers. Assume that ©’ = {a1,a2,..., ax}. 
The Parikh mapping, denoted by Y, is: 


v: * = N*,W(w) = (JWlars |Wlass +++» [Wha ) 


If L is a language, then the Parikh set of L is defined by: 
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W(L) = {U(w) | w € L} 


A linear set is a set M C N* such that M = {vp + 072, vias | vi € NJ, 
for some Vp, U1,...,Um in N*. A semilinear set is a finite union of linear sets, 
and a semilinear language is a language L such that W(L) is a semilinear set. 

We denote by RE,CS,CF,LIN, and REG the families of languages 
generated by arbitrary, context-sensitive, context-free, linear, and regular 
grammars, respectively (RE stands for recursively enumerable). By FIN we 
denote the family of finite languages. The following strict inclusions hold: 
FINC REGCLIN CCF CCS C RE. We call this, the Chomsky hierar- 
chy. 


3 Natural Languages and the Chomsky Hierarchy 


GI studies have focused on learning REG and CF languages (i.e, the first two 
levels in the Chomsky Hierarchy) [6, 17]. However, the Chomsky Hierarchy 
has some limitations that should be taken into account when we want to study 
natural language syntax. One of the main limitations emerges when we try to 
locate natural languages in this hierarchy. 

The question of determining the location of natural languages in the Chom- 
sky Hierarchy has been a subject of discussion since it was posed by Chomsky 
in [3]. This debate focused on the following question: “Are natural languages 
CF?". However, in the late 80s, some clear examples of natural language struc- 
tures that cannot be described using a context-free grammar were discovered 
(some examples of such constructions can be found in [11]). Linguists then 
agreed that natural languages are not CF. 

It is worth noting that although the family of CF does not contain some 
important formal languages that appear in human languages, it has good 
computational properties. The family of context-sensitive languages contains 
all important constructions that occur in natural languages, but it is believed 
that the membership problem for languages in this family cannot be solved 
in deterministic polynomial time. Therefore, the question now is: “How much 
power beyond context-free is necessary to describe these non-context-free con- 
structions that appear in natural language?" 

The idea of generating CF and non-CF structures, and keeping the gener- 
ative power under control, has led to the notion of Mildly Contezt-Sensitive 
(MCS), originally introduced by A.K. Joshi [9]. 
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Definition 1. By a Mildly Contezt-Sensitive family of languages we mean a 
family £L of languages that satisfies the following conditions: 


(i) each language in L is semilinear, 
(ii) for each language in L the membership problem is solvable in deterministic 
polynomial time, 
(iit) L contains the following three non-context-free languages: 
- multiple agreements: Ly = {a"b"c” | n > 0} 
- crossed agreements: Lz = {a"b™c"d™ | n,m > 0} 
- duplication: L3 = {ww | w € {a, b}*} 


The mechanisms for fabricating MC'S families are well known (e.g., tree 
adjoining grammars ([10]), head grammars [16], combinatory categorial gram- 
mars [19], etc). All these studies are based on the idea that the class of natural 
languages is located in the Chomsky Hierarchy, between CF and CS (i.e., it 
includes REG and CF, but it is included in CS). However, as some authors 
have pointed out (for instance, see [12]), this assumption is not necessarily 
true, as natural languages could occupy an orthogonal position in the Chom- 
sky Hierarchy (i.e., it contains some REG, some CF, etc.). In fact, we can find 
some constructions in natural languages that are neither REG or CF, and also 
some REG and CF constructions that do not appear naturally in sentences. 

Taking these ideas into account, we consider that the study of natural lan- 
guage syntax from a formal point of view should focus on mechanisms that 
generate MCS languages and occupy an orthogonal position in the Chom- 
sky Hierarchy. Unfortunately, most research on Grammatical Inference is not 
based on a class of languages with such features. 


4 P-dimensional External Contextual Grammars 


Contextual grammars were introduced by S. Marcus in [13], motivated by nat- 
ural language investigations (for instance, modelling the acceptance of a word 
only in certain contexts). Roughly speaking, a contextual grammar produces 
a language starting from a finite set of words (azioms) and iteratively adding 
contexts (pair of words) to the currently generated words. Unlike the Chom- 
sky grammars, contextual grammars do not involve nonterminals and they 
do not have rules of derivation except one general rule: to adjoin contexts. In 
the derivation process of the contextual grammars, the contexts can be added 
in two different ways: at the ends of the current string (these grammars are 
called external); or inside the current string (internal grammars). 
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Many variants have been investigated [15]. One of them is the so called 
Many-dimensional External Conteztual grammars. These grammars extend 
the external contextual grammars, but work with vectors of words and vectors 
of contexts. Their linguistic relevance has been investigated in [11]. 

Let p > 1 be a fixed integer, and let »’ be an alphabet. A p-word 
xz over »’ is a p-dimensional vector whose components are words over », 


ie., © = (X1,%2,...,Lp), where x; € L*,1 <i < p. A p-context c over 
+) is a p-dimensional vector whose components are contexts over »’, ie., 
c = [c1,¢2,...,Cp] where c; = (ui,ui),ui,vi € &*, 1 < i < p. We denote 


vectors of words with round brackets, and vectors of contexts with square 
brackets. 


Definition 2. Let p > 1 be an integer. A p-dimensional External Contextual 
grammar is G = (2’, B,C), where X is the alphabet of G, B is a finite set of 
p-words over 3 called the base of G, and C is a finite set of p-contexts over 
3). C is called the set of contexts of G. 


The direct derivation relation with respect to G is a binary relation between 
p-words over 3), denoted by +g, or => if G is understood from the context. 
Let x = (1,22,...,£p) and y = (Y1,Y2,---,Yp) be two p-words over X. By 
definition, c +c y iff y = (u121U1, U2t2V2,...,UpLpUp) for some p-context 
c = [(ur, v1), (U2, V2), ..-, (Up, Up)] € C. The derivation relation with respect 
to G, denoted by +G, or =* if no confusion is possible, is the reflexive and 
transitive closure of >a. 


Definition 3. Let G = (2',B,C) be a p-dimensional External Contextual 
grammar. The language generated by G, denoted L(G), is defined as: 


L(G) = {y € &"| there exists (x1, 22,...,f)) € B such that (21, X2,..., Lp) 
SG (Yrs Yas Up) and y = y1Y2.--Yp}- 


The family of all p-dimensional External Contextual languages is denoted 
by EC). 


Remark 4.1 Any family EC, for p > 2 is a subfamily of linear simple matrix 
languages (see [11)). 


Definition 4. A Linear Simple Matriz Grammar of degree n, n > 1, is a 
grammar G = (Nj,..., Np, ©’, M, 5S), where: 


e N,: nonterminal alphabet. 
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S}: terminal alphabet. 

S: start symbol. 

M: finite set of matrices of the form 

1.(S — Aj...Ap), for Aj € Ni, 1 <i<p, or 

2. (A, > @1,A2 > @,..., Ap > Lp), for Ay € Ni, xi € L*,1 <i <p, or 

8. (Ay > 2, Biy1, Ao > toBoye,..., Ap > LpBpyp), for Aji, Bi € Ni, 
Li, Yi € 1 <icp. 


Kudlek et al. [11] showed that for p > 2, the family EC, is a MCS family 
of languages. They also showed that EC, occupies an orthogonal position 
in the Chomsky Hierarchy. Therefore, EC, is a mechanism with the desired 
properties described in Section 3. 


5 The Simple p-Dimensional External Contextual Case 


Taking into account the relevant properties of EC, from a linguistic and 
computational point of view, in this section we will study its learnability from 
positive data. 

One of the most important models investigated in GI is the model of 
identification in the limit, introduced by E.M. Gold in [7]. In this model, an 
infinite sequence of examples of the unknown language is presented to the 
learner, and its eventual or limiting behavior is used as the criterion of its 
success. 


Definition 5. Method M identifies language L in the limit if, after a finite 
number of eramples, M makes a correct guess and does not alter its guess 
thereafter. A class of languages is identifiable in the limit if there is a method 
M such that given any language of the class and given any admissible example 
sequence for this language, M identifies the language in the limit. 


Two different learning settings are considered in this model: learning from 
text (only strings that belong to the language are given to the learner. It is 
also known as learning from positive data) and learning from informant (in 
addition to positive data, strings that do not belong to the language are also 
given to the learner). 

Although it is desirable to learn from only positive data, Gold [7] proves 
that superfinite classes (i.e., classes of languages that contains all finite lan- 
guages and at least one infinite language) are not identifiable in the limit 
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from positive data. This implies that even the smallest class in the Chomsky 
Hierarchy (i.e., REG) is not identifiable in the limit from positive data. 

According to the general definition, the EC, grammar family is superfinite, 
since the base of G can be any finite set of p-words. We denote by p the 
dimension and by q the number of contexts. 


Theorem 1. The class EC, is superfinite. 


Proof. Let p= q = 1. For any finite set S of strings over 3’, consider a EC, 
with a base set S and an empty context set. Then, such a HC) generates 
a finite language S. A EC, with a base and a context set {[(a, A)]} can 
generate an infinite language a*. Therefore, the language class is superfinite. 


Corollary 5.1 EC, is not identifiable in the limit from positive data. 


Hence, we need to set some restrictions to make it possible to learn this 
class in the limit from only positive data. 


Definition 6. A Simple p-dimensional External Contextual grammar is G = 
(7,B,C), where © is the alphabet of G, B is a singleton of p-words over 3 
called the base of G, and C is a finite set of p-contezts over 3’. C is called the 
set of contexts of G. 


Therefore, a Simple many-dimensional External Conteztual grammar is 
a subfamily of EC. The main difference is that the base of a Simple p- 
dimensional External Contextual grammar is restricted to a single p-word. 

The family of all Simple p-dimensional External Contextual languages is 
denoted by SEC. 


5.1 Properties of SEC, grammars 


Even if the base is a singleton, the family of SEC, has several properties that 
are very interesting from a linguistic point of view. Here we present some of 
the most remarkable ones. On the basis of analogous arguments to those used 
by Kudlek et al. in [11], we can establish the following theorems. 


Theorem 2. For every integer p > 2, the family SEC, is a MCS family of 
languages. 


Proof. 1. SEC, © EC, and EC), contains semilinear languages only (see 


[11]). 
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2. By membership problem the following is understood: given a language 
L C &* (defined by a certain type of grammar, automaton, etc.) and a 
word w € X™*, decide algorithmically whether w is in DL or not. Since the 
membership problem is polynomially decidable for E'C,, it follows that 
each family SEC,, p > 1, is parsable in polynomial time (see [11], [14]). 

3. The following languages are in SEC, for every p > 2: 

- multiple agreements: Ly = {a"b"c” | n > 0} 

- crossed agreements: Ly = {a"b™c"d™ | n,m > 0} 

- duplication: L3 = {ww | w € {a, b}*} 

It is easy to construct SHC, grammars for each of these languages: 


i) Ly = {a"b"c” | n > O}. It is generated by the SEC, grammar 
(i) { g y r& 
G, = ({a,b,c}, B,C), where: 


(ii) Lo = {a"b™c"d™ | n,m > O}. It is generated by the SEC, grammar 
G2 — ({a, b, Cc, d}, B, C), where: 


(iii) Lz = {ww | w © {a,b}*}. It is generated by the SEC, grammar 
G3 = ({a,b}, B,C), where: 


Moreover, SEC, occupies an orthogonal position in the Chomsky Hierar- 
chy. 


Theorem 3. 1. SEC, CCS, for every p> 1. 
2. Each family SEC), p > 2, is incomparable with the family CF. The family 
SEC, is strictly contained in CF. 
3. Hach family SEC,, p > 1, is incomparable with the family REG. 
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Proof. 1. Since no deletion is observed in the derivation process of a string 
ina SEC, grammar, the first statement follows. 


2. From Theorem 2 it follows that every family SEC,, p > 2, contains 
noncontext-free languages. Consider now the context-free language D = 
{a"b"|n > O}*. Assume that L can be generated by a SEC, grammar 
G = (X’, B,C). Consider the following word from L: 


w= ob gh?.ait ht 

a ; 
where p < r. One can easily see that by pumping all occurring contexts 
we cannot generate w , so L is not in SEC), for any p > 2. 


The second part of this statement follows from the fact that the family 
of external contextual languages is equal to MinLIN, which is a strict 
subfamily of LIN, incomparable with REG (see [11]). 


3. Note that each family SEC), p > 1, contains nonregular languages. Now, 
consider the regular language L = a* Ub*. One can verify that L is not in 
SEC), for any p > 1. 


Figure 1 shows the location of SEC, family in the Chomsky Hierarchy. 


Fig. 1. The SEC, family occupies an orthogonal position in the Chomsky hierarchy. 


Moreover, the SEC, grammar has another property with regard to EC, 
grammars. We can find some languages showing the proper inclusion: 


SEC, C ECy 
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For example, L = {a,b,c}. It is generated by an EC, grammar, but can 
never be generated by a SEC, grammar because of the restricted features of 
SEC) grammars. This demonstrates that SHC, is not superfinite. 


5.2 Learnability of SEC, languages from only positive data 


Shinohara [18] showed that the class of languages generated by CS grammars 
with a fixed number of rules is learnable from only positive data. Hence, if 
we can transform a given SEC grammar with dimension p and degree q into 
an equivalent LS MG (linear simple matrix grammar [5]) with dimension p’ 
and degree q’ and this into an equivalent C'S grammar with a fixed number 
of rules, we will achieve our goal. 

We will give the following constructive demonstration to prove that 
SEC q C LSMGy q C CS grammars with a fixed number of rules. 


First, we need to define p, g, p’ and q’. 


i) SLC ag 
- p: dimension (in the same sense as SEC,), 
- q: degree (the number of contexts). 
(ii) LSMGy qi: 
- p’: number of nonterminals in the right hand of the unique rule of the 
LSMG started by S. 


- q’: number of matrices. 


Let G = (X', B,C) be a SEC), grammar, where 


- B= {(M1,--:%)} 
- C= { a= [(at, Bt), to) (a5, Ok; sry Cg = (at, G2), sey (aa, 7) } 


We can transform this SEC grammar with dimension p and degree q into 
an equivalent LSMG with dimension p’ and degree q’. 


G’ = (M,..., Np, ©’, P, S), where 


-P= { NS) => Aj...Ap, 
(Ay nae Ae) Ap == Yp)s 

(Ay —=¢ ay Ai fi, REG Ap —=# OR Ages). 
(ec); 

(Ay — af AGF, ..., Ap — ag A,(s)} 
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for A; € Ni, 1,07, 63 € D*,1<i<pl<j<q 


The number of rules of an equivalent CSG will be proportional to p’ - q’. 
Generally, there exists a CSG with the number of rules < k-p'-q' (kisa 
constant). 


We now illustrate this method using a grammar as follows. As a simple 
example, consider a SEC p,q with p = 2 and q = 2. 


Let G = ({a,6,c,d}, B,C) be a SEC),q grammar, where 

-B = {(ab,cd)} 

-C= { C1 = I(a, r), (c, )I, c2 = L(A, b), (A, d)| } 

Note that L(G) = {a™b"c™d"|m, n > Of. 

We can transform this SEC’grammar with dimension p and degree q into 
an equivalent LS MG with dimension p’ and degree q’. 

G’ = ({S,A, A’}, {a, b,c, d}, P, S), where 


-P={ mo:S — AA’, 

my: (A — ab, A’ — cd), 
mg: (A — aA, A’ — cA’), 
m3: (A —> Ab, A’ —> A’d) }. 


Now, we can construct a CSG: G” = ( Vy, T, P’, S), where 
Vn = {5, A, A’, B, Ri, Ro, R3} 


P’ = {S — ABA’ 

AB — abR, Rib — bR, Ric — cR, bB — Bb 
AB — aAR» Rob — bRo Roc — cRo cB — Be 
AB — AbR3 R3b ==> bR3 R3c =? cR3 


R, A’ — Bcd R, A’ — cd 
Ro A’ —> BcA' RA’ —- cA’ 
R3 A’ —? BA'd R3 A’ —-, A’d} 


Note that the set of rules presented here may contain some redundancy. 
However, we gave a priority to the consistency of the manner of constructing 
corresponding C'SGs for general cases. 


It is easy to prove that L(G) = L(G’) = L(G”). We will do it in two steps: 


1. L(G) & L(G’) 
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2. L(G’) @ L(G"). 


an 
NS 


Proof 1 
L(G) > L(G’) 


Let G = (©, B,C) be a SEC), grammar such that L(G) = L. Define the 
LSMG pq G! = (M1, ..., Np, 2, M,S) such that A; € Nj,1 <i <p. The 
set M contains the following matrices: 


(S — A; A...Ap). The number of nonterminals in the right hand of the 
unique rule of the LSMG,, 4 started by S, is equal to the dimension 
of the SEC,,,. Therefore, p = p’. 

For the p-word (x1, £2,..., Zp»), which constitutes the base of SEC,,, M 
contains the following matrix of rules: (Aj — 2, A2 > £,..., Ap > 
Zp). There is only one matrix of this kind because the base of the 
SEC,,q is a singleton (it has only one p-word). 

For each p-context c = [(u1, V1), (U2, V2), .--, (Up, Up)| € C, M contains 
the matrix of rules: (A; > u1A1v1, Ag — u2A2v2,..., Ap + UpApUp). 
In this way, when we apply the contexts cj, C2,...,¢g, we obtain the 
same result as when we apply the matrices m2, m3, ...,7M%q+1, Tespec- 
tively. 


It is easy to see that L(G’) = L. By construction, for every s € L(G) there 
exists a derivation of s in G’. 


L(G) = L(G’). 


Let G’ be the LSMG,, 4’, with L(G’) = L. We define a SEC, grammar 
G = (2, B,C) such that: 


For the matrix (Aj — 21, Az > %,..., Ap > tp) € M, B contains the 
p-word (21, %2,...,%p). Therefore, the elements of B coincide with the 
elements on the right hand of the matrix (Ay — 21, Az > @,..., Ap > 
Xp): 

For each matrix of rules (Ay — w,Ajvu1, Ag — U2Ag2ve,...,Ap 
UpApUp) € M, the set C of p-contexts contains c = [(u1, v1), (U2, v2), «5 
(Up, Up)]. Therefore, the number of matrices is equal to the number of 
contexts + 1. 


It is easy to verify that L(G) = L. By construction, for every s € L(G’) 
there exists a derivation of s in G. 
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Proof 2 
(i) L(G’) > L(G") 


Let G’ = (M,..., Np, ©',M,5S) be a LSMG,q such that L(G’) = L. 

Define the CSG G” = (N, 5’, P, S), where: N is a finite set of nonterminal 

symbols, 2 is a finite set of terminal symbols that is disjoint from N, P 

is a finite set of production rules and S € N is the start symbol. The set 

P contains the following rules: 

e S — A,BA2A3...Ap. The right hand of S coincides with the right 
hand of the unique rule started by S of the LSMGp 4’. We add the 
nonterminal B when p > 2, to allow applications of different rules. 

e For each matrix of M, P contains the following rules: 

e For the first rule of each matrix, P contains: 

A,B — rR, 

A,B = u,Ajv; Ro 

AiB— uy Av Rg 

qd’ is the number of matrices. So, there are correspondences between 
choosing the rule that contains R,, for example, and applying ma- 
trix m1. 

e For the second rule of each matrix, P contains: 

R, Ag = rR, 

Ry Ag = ugAgv2 Ro 

Ry Ag md uz AgveRg 

We apply this kind of rule from the second to the p—1 rule of each 
matrix (note that each matrix has p rules). 

e For the p rule of each matrix, P contains: 

Ri, Ap > Bry | Zp 

RyAp > BupApvp | UpApUp 

Ry Ap > BupApvp | UpApvp- 

If we use the rule that contains the nonterminal B, we will go back 
and apply more rules. Otherwise, we will finish the derivation. 

e We will need to add some intermediate rules to allow us to make the 
necessary derivations. These rules don’t have any correspondence 
with the LSMG), 4’. With these intermediate rules, we swap R; to 
the right until it is adjacent to an A;, allowing us to apply another 
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rule. Similarly, we move B to the left until it is adjacent to A,, and 
then start to apply this process again. 


It is easy to see that L(G”) = L. By construction, for every s € L(G’) 
there exists a derivation of s in G”. 


(ii) L(G’) = L(G") 


Let G” be the CSG, with L(G”) = L. We define a LSMG, 4 G' = 

(Ni,..., Np, ©’, M, S) such that: 

e For the unique rule started by S of the CSG, M contains the same 
rule without the nonterminal B. 

e For all the rules that contain R; in the CSG (except intermediate 
rules), where 1 <i <q’, M contains a matrix with all these rules, but 
B, R; and repeated rules are deleted. 


It is easy to verify that L(G’) = L. By construction, for every s € L(G”) 
there exists a derivation of s in G’. 


Hence, there are clear relationships between SEC 


CSG. 


LSMGy 4: and 


Pq) 


(i) p’ = p (in our example, p is equal to 2; therefore, the number of nonter- 
minals in the right hand of the unique rule of the LSMG started by S is 
2). 
(ii) g’ = q+ 1 (in our example, q¢ is equal to 2; therefore, the number of 
matrices of LSMG has to be 3). 
(iii) The fixed number of rules of CSG is proportional to p”q’. Generally, one 
can have G” with O(p’-q’) number of rules. Since p’ and q’ are given, G” 
has a bounded number of rules. 


From a result, by Shinohara [18], we can obtain the following theorem: 


Theorem 4. Given p' > 0 and qd > 0, the class of languages generated by 
linear simple matrix grammars with dimension p' and degree q' is learnable 
from positive data. 


Corollary 5.2 Given p > 0 and q > 0, the class of languages generated 
by simple external contextual grammars with dimension p and degree q is 
learnable from positive data. 
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Although what we have proved is enough to show that SEC can be learned 
from only positive data, we have a stronger result. As we will prove below, 
SEC with any dimension, but with at most q contexts and m bases, has finite 
elasticity (a sufficient condition for learning from positive data). 


We will use the notation C to mean a proper subset relation in the sequel. 


By Sec(p,q,m), we denote the class of languages that can be generated 
by SEC's with a dimension that is less than or equal to p, with at most q 
contexts, and with at most m bases. By Sec(*,q,m), we denote the class of 
languages defined by 


Sec(*,q,m) = (J Sec(p, q,m). 


p=1 


Let w be a string over X’. A pair (b, C) of a base b and a set C of contexts 
is said to minimally generate w if and only if w is generated by using a base b 
and contexts in C and there exists no b/ and C’ such that b = b’, C’ C C and 
w is generated by using b’ and C’. For a string w, by MinC(w), we denote 
the set of all pairs (6, C) (b:base, C:set. of contexts) which minimally generate 
w. It is clear that the following lemma holds: 


Lemma 1. For any w € &*, MinC(w) is finite. 


Theorem 5. The class Sec(*,q,m) has finite elasticity. Therefore, it is iden- 
tifiable in the limit from positive data. 


Proof. Assume that the class Sec(*,q,m) has infinite elasticity. 

There exists an infinite sequence wo, Wj, We,... of strings in /* and an 
infinite sequence Lj, L2,... of languages in Sec(*,g,m) such that, for any 
k = 1, {wo, w1, very We—-1} Cc Ly and Wk g Ly hold. 

For each i = 1,2,..., let S; be some SEC generating L;. Note that each 
5S; includes some element of MinC(wo) in its base and context set. Since 
MinC(wo) is finite by the above lemma, there exists Co € MinC(wo) such 
that infinitely many 5,’s include Co. Let o = S,,,Sn,.,.-. be an infinite se- 
quence of such SEC’s including Co. Note that o is a subsequence of $1, So, .... 
(That is m1, no, ... is a subsequence of 1,2,3,....) 

The string wp, is not an element of L,,,. Therefore, w,, is not generated 
by S;,,. But, the infinite subsequence S),,,5n3,5n,,--- Should generate wp,, 
and therefore, should include some element of MinC(wy,,) in its base and 
context set. Since MinC(wn,) is finite, there exists Cy € MinC(wn,) such 
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that infinitely many S;,,’s include C,. Note that Co does not generate wp,. 
Therefore, |Co| < |Co U C\| holds. 

Repeating the same discussion, we can find an infinite sequence Co, C}, ... 
satisfying the following conditions: 


1. |Co| < |Co U C1 | < |Co UCL U C4| Se holds, 
2. for any q, there exist infinitely many SEC’s in $j, S2,... which include 
Co U-:- UC, as its base and context set. 


These conditions contradict the fact that the number of contexts and bases 
are upper bounded by q and m, respectively. This completes the proof. 


6 Concluding Remarks 


Despite the fact that REG and CF grammars are mechanisms with limited 
representational power to describe some constructions that appear in natural 
languages, GI studies have focused on them. In this paper we have proposed 
to study classes of languages that are more relevant from a linguistic point of 
view. 

On one hand, we have seen that MCS languages provide a grammatical 
environment for natural language constructions. On the other hand, we have 
given some arguments that support the idea that natural languages could oc- 
cupy an orthogonal position in the Chomsky Hierarchy. Therefore, it would be 
very interesting to study mechanisms with these properties (i.e., they fabri- 
cate MCS languages and they occupy an orthogonal position in the Chomsky 
Hierarchy). 

P-dimensional External Contextual grammars are an example of a mech- 
anism with such features. Hence, we believe they could have a chance in the 
study of natural language syntax. However, in order to study its learnabil- 
ity from only positive data, we have to restrict the grammar. So, we have 
introduced a new class of languages called Simple External Contextual. We 
have shown that this class with fixed dimension and degree is learnable from 
positive data, from Shinohara’s results [18]. Moreover, we have presented a 
second stronger result that shows that Simple External Contextual with any 
dimension, but at most q contexts and m bases, has finite elasticity (sufficient 
condition for positive data learnability). 

In the future, we would like to have a better understanding of the proper- 
ties of the new class proposed and extend these learnability results. Moreover, 
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taking into account that corrections are also available to the child in the 
early stages of language acquisition, and that the idea of corrections has been 
successfully applied to learn REG languages [1], we would like to study the 
learnability of Simple External Contextual languages using positive data and 
corrections. 
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Summary. The aim of this paper is to show how grammar systems and concur- 
rent programs might be viewed as related models for distributed and cooperating 
computation. We argue that it is possible to translate a grammar system into a 
concurrent program, where the Owicki-Gries theory and other tools available in the 
programming framework can be used. The converse translation is also possible and 
this turns out to be useful when we are looking for a grammar system that can 
generate a given language. 

In order to show this we use tools from concurrent programming theory to prove 
that Lea = {a™b™c"d™ | n,m > 1} can be generated by a non-returning Parallel 
Communicating grammar system with three regular components. We show that 
this strategy can be helpful in the construction of grammar systems that generate 
strings in less time and more efficiently. We also discuss the absence of strategies 
in the concurrent programming theory to prove that L-q can be generated by any 
Parallel Communicating grammar system with two regular components. 
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1 Introduction 


At the beginning of computation theory, classic computing devices were cen- 
tralized: that is, the computation was accomplished by one central processor. 
But in contemporary computer science distributed computing systems that 
consist of multiple communicating processors play a major role because they 
have various advantages: efficiency, fault tolerance, scalability in the relation 
between price and performance, etc. 

Since 1960, when the concept of concurrent programming [6] was intro- 
duced, a huge variety of topics related to parallelism and concurrency have 
been defined and investigated: for example, operating systems, machine ar- 
chitectures, communication networks, circuit design, protocols for commu- 
nication and synchronization, distributed algorithms, logics for concurrency, 
automatic verification and model checking. The same trend has been observed 
in classic formal language and automata theory as well. At first, grammars 
and automata modelled classic computing devices of one agent or processor, 
so a language was generated by one grammar or recognized by one automaton. 
Inspired by different models of distributed systems in Artificial Intelligence, 
grammar systems theory [4] has been developed as a grammatical theory for 
distributed and parallel computation. More recently, similar approaches have 
been reported for systems of automata [12]. 

In the concurrent programming framework Owicki-Gries theory [13], the 
first complete programming logic for the formal development of concurrent 
programs and other programming strategies was developed to help program- 
mers analyse and design multiprograms. We argue that grammar system the- 
ory can benefit from these tools. For example: given a grammar system one can 
prove that it generates a specific language by direct reasoning or one can trans- 
late the grammar system into a multiprogram and prove the same statement 
by some strategies of programming developed in the Owicki-Gries theory. We 
exemplify this with the language {a”b"c” | n > 0}. Furthermore, we propose 
another approach to solve problems of the following type: given a language 
specification find a grammar system that generates the given language. The 
strategy widely used so far is as follows: first, propose a grammar system and 
then prove by means of language theory that the proposed grammar system 
does indeed generate the given language. We give three examples of how the 
Owicki-Gries logic of programming could help us to simultaneously obtain a 
grammar system that generates the given language and the proof that it really 
generates it. This new approach might be of great benefit for the grammar 
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systems theory. The strategy consists of translating the problem of finding 
grammar system I" of a certain type that generates a language L, into the 
problem of finding a multiprogram P. P will have as many programs Prog; as 
the grammar system I’ has grammars and will have to be correct with respect 
to the specification: 


{(w1 = 91) A (wz = So) A... AN (Wn = Sn) An > 1} P {w) € DT}. 


Then this multiprogram will be translated back into the grammar system I’, 
the whole behavior of I" being similar to that of P. Actually, the language 
generated by I is included in L, but for the examples we present here equality 
is reached, as detailed reasonings prove. 

Here we show how to apply this strategy for a well-known non-context- 
free language, namely Leg = {a”b™c"d™ | n,m > 1}. In [2] it was proved that 
Lea can be generated by a Nonreturning Centralized Parallel Communicating 
grammar system with four context free components (Leg € CPC,4(CF)). In 
[8] we improved this result showing that Leq € NPC3(REG), based on a 
similar strategy. Here we show the proof of this result. We also exemplify with 
the language {xcxc | x € {a,b}*} the use of the Owicki-Gries strategy to get 
more efficient grammar systems when combined with some other programming 
techniques used to improve parallelism. 

Finally we show how the concurrent programming framework can benefit 
from grammar system theory to get negative results; in the first one we have 
no strategy to deal with negative results of the type: a given language cannot 
be generated by any grammar system of a specified type. This kind of problems 
has to be analyzed in the grammar system framework, with the tools available 
there. 


2 Grammar System Theory: Models 


2.1 Cooperative Distributed (CD) grammar systems 


Grammar System theory started in 1988 by introducing CD grammar systems 
for modelling syntactic aspects of the blackboard model of problem solving [3]. 
It is a finite set of (usually generative) grammars which cooperate in deriving 
words of a common language. At any moment in time there is exactly one 
sentential form in generation. The component grammars generate the string 
by turns, under a cooperation protocol, called the derivation mode. In this 
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model the cooperating grammars represent independent cooperating problem 
solving agents which jointly solve a problem. They modify the contents of a 
global database, called the blackboard, which is used for storing information 
on the problem solving process. In blackboard architectures the agents com- 
municate with each other only through the blackboard: that is, there is no 
direct communication among them. 

We do not give here the formal definition of this model, but the reader 
is referred to [4] for all the formal concepts related to all grammar systems 
models mentioned in this paper. 

We fix the notation for the class of languages generated by homogeneous 
Cooperative Distributed grammar systems. We denote them as C'D,,(f) where 
n € N is the maximun number of grammar components with context free 
productions and f € {t,*}U{< k,=k,>k|k > 1} is the mode of derivation. 


2.2 Networks of language processors 


Networks of language processors form an essential area in the theory of gram- 
mar systems. Language processors, that is grammars or other language de- 
termining devices, are located in nodes of a network (a virtual graph). Each 
processor works on its own sentential form (on its own collection of senten- 
tial forms) and informs the others about its activity by communicating strings 
which can be data and/or programs. Rewriting and communication take place 
alternately, and the system functions (usually) in a synchronized manner. 

The difference between CD grammar systems and these architectures is 
that while in the first case the grammars generate a common string, in the sec- 
ond case each of them operates on its own string. There are several important 
models in the area, of which we are interested in two: Parallel Communicating 
(PC) grammar systems and Parallel Communicating grammar systems with 
Communication by Command (CCPC). 

Parallel communicating grammar systems were introduced in [15] as a 
grammatical representation of the so-called “classroom model” of problem 
solving, which is a modification of the blackboard model. 

We denote by PC,,(Y) the class of languages generated by non-centralized 
Parallel Communicating grammar systems with at most n components, each 
component with productions of type Y, where: n € N and Y € {FIN, REG, 
CF,CS, RE}. When the PC grammar system is centralized, non-returning 
non-centralized and non-returning centralized the prefixes C, N and NC, 
respectively, are added. 
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We denote by CCPC, (Y) the class of languages generated by Parallel 
Communicating grammar systems with Communication by Command with 


at most n components, each component with productions of type Y, where 
né€NandY € {FIN, REG,CF,CS, RE}. 


3 Programming 
3.1 Sequential programming 


A sequential program consists of a number of declarations and a sequence of 
instructions or actions. The actions take place one after another. That is, an 
action does not begin until the preceding one has ended. Because a sequential 
program has a sequence of actions we consider a program as a transformer of 
states or predicates [9], where a state {P} describes the relationships between 
the variables of the systems and their values by the predicate P . Each action 
S transforms the current state of the system, called precondition of S, to the 
state {Q} which is called postcondition. 
A Hoare triple is a sequence {P}S {Q}, where: 


— § is an action or instruction, 
— {P} is a state representing the precondition of S, 
— {Q} is a state representing the postcondition of S. 


Its operational interpretation is as follows: {P}S {Q} is a correct Hoare triple 
if and only if it is true that each terminating execution of S that starts from 
a state satisfying P is guaranteed to end up in a state satisfying Q. More 
precisely, if {P}S{Q} holds and S starts in a state satisfying P, we can be 
sure that S either terminates in a state satisfying Q or does not terminate 
at all. Consequently, a program ought to be annotated in such a way that 
each action carries a precondition. In other words, from a logical perspective 
a sequential program may be viewed as a sequence of Hoare triples. 

We can now formulate the concept of local correctness of a predicate Q in 
a program. We distinguish two cases: 


— If Q is the initial predicate of the program, it is locally correct whenever 
it is implied by the precondition of the program as a whole. We may also say 
that Q satisfies the hypothesis of the problem which is to be solved. 

— If Q is preceded by {P}S, i.e. by atomic action S with precondition P, 
it is locally correct whenever {P}S {Q} is a correct. Hoare-triple. 
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A sequential program is partially locally correct if all its predicates are locally 
correct and the last predicate satisfies the requirements of the problem solved, 
provided that it halts. A sequential program is totally locally correct if it is 
partially correct and always halts. 


3.2 Concurrent programming 


Concurrent execution or multiprogramming means that various sequential 
programs run simultaneously. Actions change the state of the multiprogram, 
so the critical question now is what happens if two overlapping actions change 
the same state of the multiprogram in a conflicting manner. 

Now we are ready to formulate what we call the core of the Owicki- 
Gries theory. We consider a multiprogram annotated in such a way that 
the annotation provides a precondition for the multiprogram as a whole and 
a precondition for each action in each individual program. Then, by Owicki 
and Gries, a multiprogram is correct whenever each individual predicate is 
correct, i.e.: 


— locally correct as described above and 

— globally correct: a predicate Q in a multiprogram M is globally correct 
whenever for each {P}S, i.e. for each action S with precondition P, taken 
from a program of M, {P V Q}S {Q} is a correct Hoare-triple. 


4 How to Relate Grammar Systems with Programming 


4.1 How can grammar systems benefit from programming? 


In this section we exemplify possible ways in which grammar systems can 
benefit from the Owicki-Gries theory and from some strategies of proof used 
in the formal analysis of concurrent programs. 


Example 1. We introduce the grammar system I € CD2(= 2) defined in this 
way: 


I = ({a, 6, c}, ({S, A, A’, B, B’},0, Pi, = 2), 


({S, A, A’, B, B’}, {a,b,c}, Po, = 2), S) 


where: 
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Pi ={S—> S,S > AB, A’ A, B’= B}, 


P, = {A- aA’b, B > cB’,A — ab, B = c}. 
We transcribe the proof taken from [5] that all the derivations in I, are 
of the form: 


S S AB aAIbB => aAlbcB! => aAbcB => wees. 
P; Pr Py P Ps P> 


dusaeeteed = a” Alb" c” BI re a” Ab"c"B = arr igettentt 


P2 Po P2 


for some n > 0. Hence La2(I) = {a"b"c” | n > 0} 


To show that the previous sequence of derivation is correct, and that it 
is the only possible sequence of derivation, we analyze all possible cases by 
applying the technique of analysis by cases. 

We have to start from S. Only P, can be used. Applying the rule S — S 
twice changes nothing, so eventually we shall perform the step 


S=>s ee AB 
Py 
From now on, S$ will never appear again. Only P, can be applied to AB. 
If we use the nonterminal rules, we get: AB = aA’bB = aA’bcB’ 


In general, from a string of the form ABEB (initially we have i = j = 
k = 0), we can obtain a’ Ab’c* B => at A’b) tt ck+1 B’ 


2 


To such a string we have to apply P; again so we get: 


a’t] Arpt tt kt! By —» git! Abit ft B 
P2 

This is the only possibility of using P,. However, P2 can be applied to a 
string a’ Ab)c* B in the = 2 mode also using only one nonterminal rule (replac- 
ing either A or B by A’ or B’, respectively), and one terminal rule (removing 
the remaining symbol A or B). To a string containing only one nonterminal 
(which is different from S$’), none of the two components can be applied. Con- 
sequently, we have to use, in turn, the first component and the nonterminal 
rules of the second one, and we have to finish the derivation by using the 
terminal rules of Ps. 
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Now we present another alternative to solve the problem introduced above. 
We show that it is possible to automatically translate a grammar system to a 
concurrent program, and we use the Owicki-Gries theory to give our proofs. 

We use the CD grammar system given above to exemplify this. So first we 
make the automatic translation of I to a concurrent program: 


P: {Begin Main Program} 
{declaration of variables} 


w:=S; 

P:{w=S} 

do belong(w, non_terminals) > 
Progi | Prog 

od 


Q: {w € {a"b"c” | n > O}} 
{End Main Program} 


For this example the translation generates programs Prog, and Progo, 
one program Prog; for each grammar G;, and a global variable w whose ini- 
tial value is S has been introduced to represent the current sentential form 
that all programs Prog; can access and modify. The fact that in the grammar 
system J, derivation finishes when a string of terminals is generated is mod- 
elled here by the cyclic instruction “do Condition — Instructions od” that 
iterates while the Condition is satisfied, in this case while string w contains 
non terminals. The set of productions P; of each G; are represented by the 
alternative construction called “if” that executes an assignment to the right 
of one arrow if one of the predicates to the left of the arrow is true. More 
than one predicate to the left of the arrows can be true, and in this case one 
is chosen non deterministically. 

And the = 2-mode of derivation of G and G2 is preserved by adding the 
variable cont, to count the number of derivations performed on the sentential 
form. If the number of derivations is different from two, the programs abort. 
Also the symbols < and > enclose programs Prog; and Prog2 to denote that 
these programs are considered atomic instructions. This means that once the 
processor is assigned to the program, it can not be released or reassigned to 
another program before its execution finishes. 
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Prog, : {program for G1} 
cont : int; 
< cont := 0; 
do (w=axSyVw=2cAyVw=xB’y) Acont #25 
if w=axSy > w:=xSy; cont := cont + 1; 
w=arSy > w:= cABy; cont := cont + 1; 
w= rA’y > w:=xAy; cont := cont + 1; 
w= xBy > w:= xB’'y; cont := cont+1; 
fi 
od; 
if (cont #4 2) —> abort > 
{end program Prog: } 


Prog2 : {program for G2} 
cont : int; 
< cont := 0; 
do (w= xAyV w = xBy) A cont #2 
if w= avAy—- w:= xaA’by; cont := cont + 1; 
w= «By w:= «cB’y; cont := cont + 1; 
w= xAy > w:= xaby; cont := cont + 1; 
w= «By — w:= «cy; cont := cont + 1; 
fi 
od; 
if (cont #4 2) —> abort > 
{end program Prog} 

So we have automatically generated each program Prog; for each grammar 
G; where each program Prog; modifies the global variable w in the same way 
as each grammar G; modifies the current sentential form and preserves the 
mode of derivation of G;. 

What remains to be done is to prove that when the programs Prog, and 
Progz that we have defined run concurrently, they behave like I. According to 
the Owicki-Gries theory that we have introduced, this is equivalent to proving 
the global correctness of the multiprogram with respect to the precondition 
{w = S} and postcondition {w € {a"b"c” | n > OF}. 

The analysis needed for this proof is similar to the one we showed above 
but in the programming framework. But for some problems, like this one, the 
Owicki-Gries theory also contemplates the possibility of using the so-called 
System Invariant strategy. To apply this strategy we need to find a predicate 
that remains invariant throughout all the computation and that synthesizes 
the behavior of the multiprogram and in case we find it we reduce the number 
of proofs to a linear size. 
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Definition 1. By definition a relation I is a system invariant whenever: 

- it holds initially, i.e. is implied by the precondition of the multiprogram 
as a whole, 

- it is maintained by each individual atomic statement {Q}S of each indi- 
vidual component, i.e. whenever for each such {Q}S, {IA Q}S{I} is a correct 
Hoare-triple. 


For the previous program P we can give this system invariant: 


Inv I: [w=SV (w=a"A'b"c"B An>0)V (w=a"Ab"c"BAn> 1)V 
V(w =a" Ab"? An > 1)V (w=ab"c"B’ An >1)v 
( 


And proving that the predicate J is invariant is equivalent to proving that: 


(w=S) 41, 
_ {D}Prog:{T}, 
 {T}Prog. {I}, 
. IA (P terminates)— (w = a"b"c" An > 1). 


While the proof of 1 and 4 is trivial, the proof of 2 and 3 requires an analysis 
by cases checking that Prog; and Prog2 always rewrite strings satisfying the 
invariant in new strings that satisfy the invariant. For reasons of space we do 
not provide the proof here. 

With this example we have shown that it is possible to automatically 
translate a CD grammar system I’ to a concurrent program P. In this way 
the problem of proving that I” generates a language LD is transformed into the 
problem of proving that the program P obtained from the translation is cor- 
rect with respect to the precondition {w = S} and the postcondition {w € L}. 
So since as well as analysis by cases we now have the global correctness strat- 
egy from the Owicki-Gries theory to prove that a grammar system generates 
a language. And for some problems, like this example, it is also possible to 
prove global correctness through the system invariant strategy. 

From this example we can point out some advantages of the system in- 
variant strategy over the analysis by cases technique: 


e Once the invariant predicate has been proposed the number of proofs to 
be made is linear, instead of the exponential number of proofs needed with 
analysis by cases. 
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e With analysis by cases we can capture the overall behavior of a system by 
a general sequence of derivations including detailed information, such as 
how grammars interact, which productions they apply, how they change 
the sentential form, etc. When we apply the system invariant technique we 
capture the overall behavior of the system by an invariant that shows all 
the possible values of the sentential form and hides information that the 
previous technique gives. So we can say that the invariant system captures 
the overall behavior in a more abstract way. 

e With analysis by cases, apart from showing the shape that any sequence 
of derivation should have, we need to prove that this is the only possi- 
ble sequence of derivations, and add an explanation in natural language. 
Like the system invariant technique we need to prove in the framework of 
predicate calculus that each program Prog; preserves the invariant. This 
is done by formal proofs. 


Another benefit. of the Owicki-Gries logic of programming is that it can be 
used to prove certain matters related to dynamic aspects of a grammar such 
as: reachability of a configuration, absence of progress because of circularity 
(in the case of PC grammar systems with communication by query), deadlock, 
etc. 

Another advantage of the Owicki-Gries logic of programming is that it 
can help us to simultaneously construct the grammar systems that a given 
language specification generates and the proof that it generates (see [1]). This 
is a great improvement, because we did not have any techniques in the frame- 
work of grammar systems, to help us to solve this kind of problem. We give 
two examples of the use of this strategy. The first example is taken from [8]: 


Theorem 1. L.g € NPC3(REG) 


Proof. We want to find a non-returning, non-centralized grammar system I" 
with regular components that generates D.q. This problem is transformed into 
the equivalent problem of finding a multiprogram P that behaves like [ and 
is correct with respect to the specification: 


{(wi = $1) A (we = $2) A diisneas A (Wn = Sn) |n> 1} P {wi € Lea}. 


The problem remains the same, but we use different tools to solve it: in- 
stead of induction and analysis by cases available in the framework of gram- 
mar systems we use Logic, the Owicki-Gries theory and programming strategies 
from the programming framework. 


; TRIANGLE 8 e June 2012 


30 M.A. Grando 


The strategy we followed for this proof is frequently used for the devel- 
opment of programs. It is called refinement of the problem and consists of: 


(I) First, start with an outline of the solution, which identifies the basic prin- 
ciple by which the input can be transformed into the output. Define pre- and 
post- conditions for each of the subproblems that are identified as part of the 
solution for the whole problem. 


For our problem we propose this idea: 
{(wy = $1) A (we = $9) A... A (Wn = Sn)} 
Subproblem 1: (Rewrite)?, with p> 1 
{(wi = Si) A... A (wi = aP Si) A... A (wy = CPS5) Aw. A (Wn = Sn) A (p > I} 
Subproblem 2: (Rewrite; Communication)* 
Find a way to stop the productions of a’s and c’s, through synchronization 
by communication. 
{(w1 =a"™N)A....... A(wr = CN2)A....-A(Wn = Sn) A(r > LACM, No € 


Subproblem 3: (Rewrite)™, with m > 1 
(wy = a BQ) A wee Akai = NGI ass. A (Wn = Sn)A 
ee eer eat \ 
Subproblem 4: Communication 
{(wy = a"b™c"d'™-1N3) A (r,m > 1) A(N3 € N)} 
Subproblem 5: Rewrite 
{(w1 = a"b™c"d™) A (r,m = 1)} 
or equivalently 
{wi € {a™b™c"d™ Ar >1|m> 1}} 


(II) Now we make the outline indicated more precise, refine the subproblems 
by trying to simultaneously find the instructions that solve the subproblems 
and the proof of its local correctness. We also discuss the difficulties we might 
have when proving overall correctness. 

In the refinement of subproblems 1, 2, 3, 4 and 5 we proposed three pro- 
grams Prog;, Progz and Prog3. These programs make up the multiprogram 
P, run simultaneously and behave like a non-returning, non-centralized gram- 
mar system with regular productions. With the subproblems we have identi- 
fied in the step above, they behave locally correctly. 

In the case of Subproblem 1 we propose this refinement: 

{(wi = $1) A (we = So) A (w3 = S3)} 

Subproblem 1: Rewrite”, with n> 1 
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Prog, rewrites n — 1 times S; to aS, and then rewrites S; to aA, Prog 
rewrites n—1 times S2 to cS2 and then rewrites Sz to cB and Progs rewrites 
n—1 times $3 to S3, until it decides to finish the production of a’s and c’s, 
rewriting S3 to Qa. 

To be sure that w2 = c”B when Prog3 introduces Q2, Prog3 should not 
be able to rewrite Sz, and after Progz introduces B it should rewrite it for 
another nonterminal and not introduce B any more. 

The reason why w, = a" A and w,; 4 a”S; is that this is the only possibility 
that does not lead to deadlock, as the states of the next subproblem show. 

{(w1 = a"A) A (we = cB) A (w3 = Q2) A (n > 1)} 

For Subproblem 2 we propose this sequence of rewriting and communications 
as a refinement: 

{(w1 = a" A) A (we = cB) A (w3 = Q2) A (n > 1)} 

Subproblem 2: 

Communication 
{w, =a"AA wo =c"BAw3 =c"BAn> 1} 
Rewrite 
Prog; rewrites A to A’, Prog rewrites B to Qi and Prog3 rewrites 
Bto D 
We do not allow any possibility other than w, = a" A’ A wo = c"Qi/A 
w3 = c"D. 
To be sure that w; = a” A’ after the rewriting step, we need Prog2 to 
be defined only for A’, and after Prog, introduces A’ it should rewrite 
it to another nonterminal and not introduce A’ anymore. 
{w, =a"A Awe =c"Qi1 Aw3 =c"DAn> I} 
Communication 
{(w, = a" A’) A (we = c"a" A’) A (w3 = c"D) A(n > 1)} 
In the case of Subproblem 3 this is a possible refinement: 

{(w1 = aA’) A (we = c"a" A’) A (w3 = c"D) A (n> 1)} 

Subproblem 3: Rewrite™+!, with m>1 

Prog, rewrites A’ to A” and rewrites m— 1 times A” to 0A”, and then 
rewrites A” to bQ3, Prog2 always rewrites A’ to A’ and Prog3 rewrites D to 
D’, then D’ to D” and rewrites m— 1 times D” to dD” 

{(w1 = a"b™Q3) A (we = c?a" A’) A (w3 = c"d™-1D”) A (n,m > 1)} 
Refinement for Subproblem 4 and Subproblem 5 is very simple: 

{(w1 = a"b™Qs3) A (we = ca" A’) A (wg = c*d™ 1D”) A (n,m > 1)} 

Subproblem 4: Communication 
{Gay S07 Od DY) A a = A A (ug Sd DA (ame > 1) I 


; TRIANGLE 8 e June 2012 


32 M.A. Grando 


Subproblem 5: Rewrite 

Prog, rewrites D” to d 

{(wr € {a"b™c"d™) A (n,m > 1)}} 
Equivalently we propose a non-returning, non-centralized grammar system I" 
with three regular components, defined in this way: 


T= (N, K, {a, b, Cc, d}, (Pi, 51), (P2, 52), (P3,.53)) 


where: 
N= {51, So, Saf Ay Ar, Bd y 
k= {Q1, Q2, Qs} 


P, = {S,; — aS, 5, — aA, A — A’, A’— A”, A” — dA”, A” — 
bQs3, 

D” —+ d} 

P, => {So —_ C052, S2 —_— cB,B _ Q1,A’,— A’} 

P3 = {$3 —> $3,S3 —> Qo,B D,D D’,D D”, D”— dD”} 


(III) The last and most difficult step is to prove overall correctness. 
In this case this means that we have to use the Owicki-Gries theory to 
show that the multiprogram P we constructed satisfies the next specification: 


{(wi = Si) \ (we = So) \ (w3 = S3)} P {wi € Lea} 


Furthermore, P outputs the word a”b™c"d” for any input formed by the pair 
of positive integers n,m. This is equivalent to proving that L(I°) = Lea. 

According to the definition Prog,, Progz and Prog3, which behave like 
G1, Gz and G3, respectively, we propose the following invariant: 


(wr =a"Si An>0)V (wi =a"AAn>1)V (wi =a"A’ An>1)V 
V(w1 = ab" A? Av >1An>0)V (wr =a"b"Q3 A0U>1An>1)V 


V(w1 = a®b" cd" D” Av,n,g >1Ah>0)V . 


V(wi = a®b! Id" Ae, f,g,h > 1) 
InvV : Raa ern (w2 =c7BAq>1)V 
V(we = c?Q1 Ag > 1) V (w2 = cla"A’ Ag,r > 1) 
(w3 Q2) (w3 =c"BAn => 1)V 
a so sabia) tw nn | 
V(w3 =c"d™D” An >1Am2>0) 


But the Owicki-Gries theory of global correctness can be used to prove 
that after n rewriting, with n > 1, the only possible combination of values for 
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the sentential forms w 1, wa and wg; that does not lead to a deadlock, is the 
one expressed by the state: 


{(wi = a" A) A (we = c" B) A (w3 = Q2) A(n > 1)} 


From this state it can be proved that the only valid continuation is the se- 
quence of rewriting and communications described in step 2 of the refinement 
process, which reaches the state containing {w, € {a"b™c"d™ | n,m >}} 


The strategy we have presented above differs from the traditional approach 
not in complexity, because the number of cases considered in the proofs are the 
same, but in the way the problem is approached. We suggest that the Owicki- 
Gries methodology could provide more possibilities for reasoning about prob- 
lems than the strategies commonly used in the grammar system framework 
because: 


1. It makes it possible to reason in a forward or data-driven way, as does 
case analysis, but also in a backward or goal-directed way. The notion of 
backward reasoning comes from psychology, as is pointed in [11] where this 
description of problem solving is given: We may have a choice between 
starting with where we wish to end, or starting with where we are at the 
moment. In the first instance we start by analyzing the goal. We ask, 
“Suppose we did achieve the goal, how would things be different- what 
subproblems would we have solved, etc.?”. This in turn would determine 
the sequence of problems, and we would work back to the beginning. In 
the second instance we start by analyzing the present situation, see the 
implications of the given conditions and lay-out, and attack the various 
subproblems in a “forward direction”. 

2. Problems can be divided into subproblems because of the theorem: for 
any Q {P}So; Si{R} <= {P}So{Q} A {Q}S)|{R}, where P,R are pred- 
icates and So, Sj are instructions. Also goals and subgoals are discussed 
in the psychology text mentioned above ([11]): The person perceives in 
his surrounding goals capable of removing his needs and fulfilling his de- 
sires... And there is the important phenomenon of emergence of subgoals. 
The pathways to goals are often perceived as organized into a number of 
subparts, each of which constitutes and intermediate subgoal to be attained 
on the way to the ultimate goal. These characteristics suggest that Owicki- 
Gries strategies are closer to how humans reason. 


We give another example showing the combined use of the Owicki-Gries 
strategy and the so-called technique of refinement of problems. 
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Example 2. We are looking for a CCPC grammar system that generates this 
language: {xcac | x € {a, b}*} 


A possible division into subproblems could be: 

{wi = S1 A we = SoA w3 = $3 A wa = Sg} 

Subproblem 1: (Rewrite)"*1, n > 0 

Proga generates 7X, x € {a, b}*,| x |=n. 

{wi = Sy Aw2 = S2’Aw3 = S3’Aws = eX Ax € {a,b} *A | a |= nAX © N} 
Subproblem 2: Communication 

Proga communicates with Proge and Prog3 sending two copies of wa. 
{w, = Sy Aw2 = eX Aw3 = eX Aws = SyAx € {a, b}*A | x |= nAX © N} 
Subproblem 3: (Rewrite)? ,p > 0 

Progg and Progs3 replace X by c. 

{w, = Sp Aw2 = «cA w3 = 2c ws = yX Ax € {a,b} *A | x |= nr 

AX EN AYE {a,b}*A | y |= p} 

Subproblem 4: Communication 

Prog, receives the content of we and w3 from Progg and Progs. 

{w, = xcrc A we = yX Aws = yX Awa = S4Aa € {a,b}*A | & |= na 
AX EN AYE {a,b}*A | y |= p} 


As we can see, with the Owicki-Gries theory we can simultaneously propose 
a multiprogram P with programs Prog,, Prog2, Prog3 and Prog, and prove 
its correctness with respect to the precondition {w, = S$, A wo = So A w3 = 
S3 A wa = S4} and postcondition {w, € {crc | x € {a, b}*}}. 

Equivalently we simultaneously define a grammar system 3 € CCPC,(CF) 
and the proof that L(I3) = {xcac | x € {a,b}*}, where I is defined in this 
way: 


Ts _ ({S1, So, Ss, Sa, Sy’, So’, S3’, Sa’, Xf, {a, b, c}, 
($1, Pi, R1), (S2, Pe, Re), ($3, P3, R3), (S4, Pa, Ra)) 


where: 

P,={Si > S1’}, Ri = {a,b} *e, 

P2 = {So => S$’, X — ch, Ro = {a, b}*X, 

P3; = {$3 > S3’,X > c}, Rg = {a,b}*X, 

Py = {Sy = aS4, 54 — bS4, S4 => Xt}, R4 = 0. 
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We do not include here the proof of global correctness of the program 
proposed, but encourage the reader to do it. 

So far the main effort in grammar system theory has focused on finding 
grammar systems with the fewest possible number of grammars and more re- 
stricted productions, to show how distribution and communication can make 
simple components very powerful when they work together. Some studies on 
the computational complexity measure of PC grammar systems that consid- 
ers the number of communications between grammars have been presented 
in [10] and [14]. This apart, the most investigated complexity measure is the 
number of grammars that a PC grammar system consists of, which is clearly 
a descriptional complexity measure. So a very important matter has been 
forgotten: the efficient use of time. The opposite has happened in the pro- 
gramming area (see [7]), where research has focused on looking for techniques 
to parallelize algorithms and to help programmers to design more efficient 
concurrent algorithms. 

Although there are no recipes to follow, in some cases we can construct 
efficient grammar systems using some of the methodical approaches developed 
in the programming framework that maximize the range of options considered 
and that provide mechanisms for evaluating alternatives. 

For example if we calculate the time that the grammar system [3 defined 
above spends to generate a string xcxc with x € {a,b}* and | x |= n, it is 
O(n) in the best case. If we want to improve the efficiency of [3 in terms of 
time taken to produce a string, we can try to apply some of the strategies 
developed in the programming framework to design parallel algorithms. For 
this example we can apply so-called functional decomposition. 


Definition 2. (Functional decomposition) Functional decomposition is a strat- 
egy of partitioning used to the design concurrent algorithms. This approach 
uses computation to expose opportunities for parallel execution. Hence, the 
idea is to define a large number of small tasks in order to yield a fine-grained 
decomposition of a problem. 


Example 8. We can apply the functional decomposition strategy over [3 to 
generate another grammar system I), that solves this problem in less time. 


We focus on the computation of the string x € {a,b}* and we discover 
that this task can be done by m grammars working simultaneously instead 
of only one grammar. Thus, we can reduce the time to O(n/m), in the best 
case. 
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For defining [3 we propose this refinement of Subproblem 1: 

{wi = $1 A we = SoA w3 = $3 A wa = Sy} 

Subproblem 1: (Rewrite)"*!, n > 0 

Proga generates 7X, x € {a,b}*,| x |=n. 

{wi = SV’ Awe = So’ Aw3 = S3’A\w4 = 2X AXE {a, b}*A | x |= n} 


To improve efficiency we propose this other refinement of the same subprob- 
lem: 
{wi = S$, Awe = SoA w3 = $3 Aw, = Sa... A Wi44 = Sita... 
A Wm+3 = Sm+3\Wmt4 = Sm+4A1 <i<m-—1A1l<m} 
Subproblem 1: 
(Rewrite)'**, t>0 
Progi+4, > Progm+3 generates 2;,...,0m—1 € {a,b}*, 1<i<m-1, 
1< mand Progm+4 generates mY, Lm € {a, b}* 
{wi = Sy’ \ W2 = So’ /\ W3 => S'3’ x W4 = Sa" IN aaalN Wi+4 = ON 
A... \ Wm43 = Um—-1/\Wm44 =LmY AL <i<cm—-1lAlL<mA 
AX1,.0;Lm € {a,b}* AY € N} 
Communication 
Prog, receives the 21,...,%m_—1 € {a, b}* produced by Progs,..., Progm+s 
followed by mY, 2m € {a,b}* produced by Progm+.4 
{wi = Si) Awe = So’ Aw3 = $3’ Awa = 21...LmY A... 
Awita = Sita... \Wmt3 = Sm43 \Wm+4 = Sma 
AL <i<m-1A1L<mA2}...am € {a,b}* AY € N} 
(Rewrite)*t', s >0 
Proga replaces Y by X. 
{wi = Sil A we = Sof A ws = $3 A wa = 21..-8mX A... A Wita = Yih 
A... \Wm43 = Ym—-1 \Wmt4 =YmY AL <tic<m—-1lAl<mA 
AX1...Lm E {a,b}* A yi, Ym € {a, D}* AY € N} 
The rest is analogous to the analysis we made for 13, and according to our 
previous analysis we get Ty € CCPC ,,44(CF), m > 1 defined in this way: 


ry = ({S1, So, S3, see Sm+4, Sy’, So’, S3’, sees Sm+4’, X, yt {a, b, ch, 
(S1,Piy Ai), (924-Pay Ra), 005 Sina, Pm4a; ma) 
where: 


P, = {81 > S1’}, Ri = {a,b} *c, 
Pz = {$9 > Sq’, X > ch, Ro = {a,b} *X, 
P3 = {S53 —? S37, X —. ch, Rg = {a, b}*X, 
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Py = {$4 > Sy, Y > X}, Ry = {a,b}"Y U {a, b}*, 

Pita = {Sita  aSiz4, Siza > bSi44,Si4a 9 a, Si¢4 > ,Si44 — A}, 
Riga =0,1<i<m-1 

Pr+4= {Sina = aSm+44, 5m+4 2 OSm+ia, Sm+4 Y}, Rn+4 = d,1<m 


The proof of global correctness of the multiprogram proposed is left as an 
exercise for the reader. 

We improve efficiency, because I, generates strings xcxc with x € {a, b}* 
and | x |= n in O(n/m), in the best case. And we show with this example 
that we can use strategies available in the programming framework to design 
grammar systems that derive strings in less time. 


4.2 How can programming benefit from grammar systems? 


If, for example, we want to prove in the grammar system theory that there 
is no grammar system with n components with a certain protocol of commu- 
nication that generates a language L, we use analysis by cases and induction 
strategies. If we translate this problem to the programming framework, we 
have to prove that it is not possible to find a multiprogram P with n pro- 
grams running concurrently, that communicate with the same protocol and 
which is correct with respect to this specification: 


{(wi = $1) A (we = S2)A...A (Wn = Sn) An > 1} P {w1 € LD} 


But in the programming framework we have no strategies for reasoning in 
the negative way. The only strategies available in this framework are verifica- 
tion, which consists of a given multiprogram that proves its correctness with 
respect to a specification (example 4), and the constructive approach, which 
we have exemplified with theorem 6, examples 7 and 9 that consist of simulta- 
neously constructing a program and the proof of its correctness with respect 
to a specification. Both strategies are useful for getting positive results. 

The lack of strategies that can prove this kind of negative result in the 
programming framework makes us think of the possibility of translating them 
to the grammar system framework and using the tools available there to solve 
them. 

For example, let us the take the problem of proving that there is no 
grammar system of any type with two regular components that can gener- 
ate L-a. If we get a solution for this we prove that Deg € NPC3(REG) is 
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the most economical solution with respect to the number of components. If 
we translate this problem into the programming framework, we want to prove 
that it is not possible to find a multiprogram P with two programs Prog; 
and Progz running concurrently, modifying w; and wz in a right-linear way, 
which is correct with respect to this specification: {(w1 = $1) A (w2 = S2)} 
P {wi € {a"b™c"d™ | n,m >}}. But because there is no strategy in the pro- 
gramming framework to solve this kind of problems, we solve it with the tools 
available in the grammar system framework: namely analysis by cases. The 
proof of the theorem L.q ¢ X2(REG), for X € {PC,CPC,NPC, NCPC}, 
can be read from [8]. 

Now if we go back to the topic of efficiency that we pointed out above, the 
grammar system framework concentrates on agglomerate tasks as much as 
possible. The aim is to get grammar systems with fewer grammars, to prove 
the power of communication. The opposite is the case in the programming 
framework where programmers try to partition programs into as many tasks 
as necessary to improve efficiency in time, looking for strategies to parallelize 
programs. So it looks like researchers are working in different directions. 

But some results of the grammar system theory can benefit the concurrent 
programming framework. For example, this theorem that makes it possible to 
transform a grammar system of m grammars into a grammar system of n 
grammars that generate the same language: 


CF =CD,,.CF(t) = CD2.CF(t) C CD3,.CF(t) and (1) 


CD3.CF(t) = CD. 4CF(t) = ETOL (2) 


There are many theorems of this kind in the grammar system theory that 
translated to the programming framework speak about the number of pro- 
grams needed to generate a certain language (refer to [4]). This is a contribu- 
tion by grammar system theory to the programming framework, where there 
are no results about the number of programs needed to solve a problem. It 
would be very interesting for the design of concurrent programs if some of 
these transformations were also to consider efficiency. Any results about how 
to transform a program P that has m multiprograms running concurrently 
into a program with n multiprograms that solves the same problem more effi- 
ciently would be a great contribution to the concurrent programming theory. 
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5 Conclusions and Future Work 


In this paper we briefly outline and illustrate the strong relationship between 
two mechanisms of distributed and cooperating computations: grammar sys- 
tems and concurrent programming. We show that it is possible to automati- 
cally translate a grammar system into a concurrent program and make proofs 
using the tools available in the programming framework. The problem used 
to show the translation from a grammar system to a concurrent program was 
a homogenous CD grammar system with = k-mode of derivation. But this 
automatic translation can also be done for all the other models of grammar 
systems: homogeneous CD grammar systems with the other modes of deriva- 
tions, hybrid CD grammar systems and networks of language processors. 

The traditional approach to the problem of finding a grammar system 
generating a given language is: first propose a grammar system and then find a 
proof that it generates the language. In this paper we present a new approach, 
taken from the programming framework. It consists of simultaneously finding 
the grammar system that generates a given language and a proof that the 
grammar system found generates it. We think that it would be interesting to 
study this approach in more detail, and try to apply it to other well-known 
languages. We could even try to find other programming strategies, apart 
from the strategy of refinement of problems shown here, that could be useful 
in solving problems related to grammar system theory. 

Until now not much attention has been paid to the time taken to generate 
a language with a grammar system, while in the programming framework the 
efficiency issue has been the main topic of research in recent years. We propose 
to follow some of the methodical approaches developed in the programming 
framework to construct more efficient grammar systems. 

Moreover we can think about how programming theory can benefit from 
grammar system theory. The lack of strategies in the programming frame- 
work to prove negative results of the type: £ # L(I) for a language £ and 
any grammar system J’, makes us think that such problems might be solved 
by translating them into the grammar system framework where they can be 
solved using the tools available there. 

It is our opinion that this work opens up possibilities for further research 
and that it seems worthwhile to continue in this direction. 
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Summary. This paper is a summary of my seminars given in the Research Group 
on Mathematical Linguistics in the year 2005. It is a short survey on automata 
theory, including finite state automata and tree automata. The transformations 
(transductions) induced by finite state automata and tree automata are given. 


1 Preliminaries and Notations 


In this section, some basic notations about formal language theory and au- 
tomata theory are given. Other concepts can be found in the references 
[1, 2, 5, 7]. 

A set is a collection of objects, we use |.S| to denote the cardinality of S, 
i.e., the number of elements of S. If |S| < oo, then S is called a finite set; 
otherwise, S is called an infinite set. An alphabet is a finite set of symbols. 
A (formal) language is a set of strings of symbols from some alphabet. An 
element x in a language L is called a word, written x © L. 
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A semigroup consists of a set M and a binary associative operation on 
M, denoted by (M,*). For Vmi,m2,m3 € M, we have (m1 * m2) * m3 = 
my, * (m2 * m3). A semigroup which has a neutral element, denoted by 1y, is 
a monoid. Given two subsets A, B of a monoid M, the product AB is defined 
by 


AB = {ce M|da € a, 5b € B: c= ab}. (1) 


Recognizable sets and rational sets are two important language families. 
They are of distinct nature and Kleene’s Theorem precisely asserts that they 
coincide in finitely generated free monoids. Some properties of regular lan- 
guages like closure properties can be proved for recognizable subsets, others 
for the rational subsets of a monoid. Here we give the formal definition of 
recognizable sets and rational sets. 


Definition 1. Let M be a monoid. A subset A of M is recognizable if there 
exist a finite monoid N, a morphism a from M into N and a subset P of N 
such that A= a7!(P). 


The set of all recognizable subsets of M is denoted by Rec(M). 


Example 1. Let M be any monoid, N = {1}. Let a be the unique morphism 
from M onto N. Then M,@ € Rec(M). 


Example 2. If M is a finite monoid, then any subset of M is recognizable. 


Ezample 3. If M = X* and X is an alphabet, then A € Rec(X*) iff A is 
recognized by a finite automaton. 


Definition 2. Let M be a monoid. The family Rat(M) of rational subsets of 
M is the least family R of M satisfying the following conditions: 


“OER; {m} ER for all me M, 
(ii) if A, BE R,then AUB, ABER, 
(iii) if A € R,then 
At=U4 eR 
n>0 
Remark 1. A rational subset of M is either empty or can be expressed, starting 
with singletons, by a finite number of unions, products, and plus or stars 
(Rational expression). 


Definition 3. Let X and Y be alphabets. A rational (resp. recognizable) re- 
lation over X and Y is a rational (resp. recognizable) subset of the monoid 


A XY*, 
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Definition 4. Let M and M' be monoids. A rational relation over M and 
M' is a rational subsets of M x M!' 


The “dynamic” notion of rational relation is a rational transduction. A 
transduction 7 from X* into Y* is a function from X* into the power set. 
P(Y*). For convenience, we write 7: X* > Y*. 


Theorem 1 (Nivat’s, 1968). Let X and Y be alphabets. The following con- 
ditions are equivalent: 


(i) Ac Rat(X* x Y*), 
(ii) There exist an alphabet Z, two morphisms yp : Z* > X*, wy: Z* + Y* 
and a regular language K C Z* such that 


A= {(vh, wh) :he K} 


(itt) There exist an alphabet Z, two alphabetic morphisms a: Z* — X*, 2: 
Z* — Y* and a regular language K C Z* such that 


A= {(ah, Bh): he Kk} 


(iv) There exist an alphabet Z, two alphabetic morphisms a: Z* — X*,@ : 
Z* — Y* and a local regular language K C Z* such that 


A= {(ah, Bh): he Kk} 


2 Transformations Induced by FSTs 


The machines realizing rational transductions are called transducers. The au- 
tomaton reads input words over alphabet X, and emits output words over 
alphabet Y. Thus, the automaton realizes a rational transduction. The fol- 
lowing is the mathematical definition of a transducer. 


2.1 Finite transducer 


Definition 5. A finite transducer is a 6-tuple T = (Q, ©’, A, 6,8, F), where 


1. Q is the finite set of states, 
2. X) is the input alphabet, 
3. A is the output alphabet, 
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4.6 is the transition-and-output function from a finite subset of Q x * to 
finite subset of Q x A*, 

5.8 €Q is the starting state, 

6. F CQ is the set of final states. 


Example 4. Figure 1 is the graph representation of the transducer T = 
({0, 1,2}, {a, b}, {0,1}, 6,0, {2}). Where the transition-and-output function 
consists of the following transition rules: 6(0,aa) = (0,0), 6(0,6) = (1,101), 
6(1, A) = (2,11), 6(1,b) = {(1, A), (1, 101)}, 6(2, a) = (2,0). 


aa/0 b/ a/0 


mee 
Ga G 


Fig. 1. Finite-state transducer 


For a given word u € %”*, we say that v € A* is an output of T for u 
if there exists a state transition sequence of T, (q1,v1) € 6(s, 1), (q2, v2) € 
5(q1,U2),---;(Gns Un) € O(Gn—1, Un), and dn € F, ie., 


ur/V1 u2/v2 Un/Un 
— q — ...... — ane F 
such that wu = uy-++Un,Uy,.-.,Un € &”*, and v = V1, -++ Un, V1,--.,Un € A”. 


We write v € T(u), where T(u) denotes the set. of all outputs of T for the 
input word uw. 


Definition 6. A finite transducer T = (Q, 5’, A, 6, 8, F) is called a generalized 
sequential machine(GSM ) if 6 is a function from Q x S' to finite subsets of 
Q x A", i.e., T reads exactly one symbol at each transition. 


2.2 Finite transduction 


For each finite transducer T = (Q, 1’, A, 6, s, F’), the transduction induced by 
T is T: 5* —> 24°. For a language L C &*, 
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Example 5. Consider the transducer in Example 4, the following relations are 
in the transductions induced by transducer T = ({0, 1,2}, {a, b}, {0, 1}, 6, 0, {2}): 


1. T(aabb) = {010111, 010110111}, 

2. T(bbba) = {101110, 101101110, 101101101110}, 
3. T(A) = 0, T(aaab) = 9, 

A. T({b, ba}) = {10111, 101110}. 


2.3 Rational relation (transduction) 


Transductions induced by finite transducers can be viewed as a relation Rr C 
o”* x A* defined by 
Rr = {(u, v)|v € T(u)} 


Definition 7. A transduction tT : X* — Y™* is rational iff rT is realized by a 
transducer. 


Corollary 1. Any rational transduction tT : X* — Y* can be realized by a 
transducer T = (Q, 57, A, 6,8, F) such that 


6 CQ x (XU {A}) x (YU {A}) x Q (2) 
and further F consists of a single state q, # 8s, and (p,u,v,q) € 6 implies 
pF andg# s. 


Theorem 2 (Nivat’s, 1968). Let © and A be finite alphabets. RC &* x A* 
is a rational relation iff there are a finite alphabet I’, a regular language L C 
I™* and a morphisms g: I* > * and h:I* — A* such that 


R= {(g(w), h(w))|w € L} 


Definition 8. Two finite transducers are said to be equivalent if they define 
exactly the same finite transduction. 


Remark 2. Equivalence problem of finite transducers is undecidable, equiva- 
lence problem for single-valued finite transducers is decidable. 


Inverse transduction: Let T : ©* — 24° be a finite transduction. Then 
the inverse of T, i.e. T~! : A* — 2”, is also a finite transduction. 


Definition 9. A finite transducer T = (Q,2',A,6,5,F) is in the standard 
form if 5: Qx (SU{)\}) 3 22@* AVO), Intuitively, the standard restricts the 
input and output of each transition to be only a single letter or X. 
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Theorem 3. Each finite transducer can be transformed into an equivalent 
finite transducer in the standard form. 


The transformation of an arbitrary finite transducer to an equivalent one 
in the standard form consists of two steps: First, each transition that reads 
more than one letter is transformed into several transitions reading exactly 
one letter. Secondly, each transition that has a string of more than one letter 
as output is transformed into several transitions such that each of them has 
exactly one letter as output. 

The two transformation steps can be represented as follows: 


() a, OG 


Sy 


a,/B a,/kh a,/kh a/i 


DdEQ,a,...,a; € Vj > 2,0 € A*, m1,...,rj;-1 new states. 


G) a/b, b, é 
1 


a/b, ~~ A/b r/b;, Ab, 


pgeEQ,ae VU {A}, bi,...,b% € A,k S 2, 71,...,re-1 new states. 


3 Tree Automata and Tree Transducers 


Tree automata were first introduced by J.W. Thatcher and J.B. Wright, and 
independently by J. Doner around 1965. 

The trees can be regarded as »/X-terms over ranked alphabet »’ and fron- 
tier alphabet X. Forest is the set of »/X-terms. Here just some basic defini- 
tions and theorems will be given. Some details can be checked in the book 
Tree Automa and some related references. 
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The ranked alphabet will be denoted by 1’, A,I, etc., in which every 
element has a rank (arity) m > 0. And the frontier alphabet (or variables) 
will be denoted by X,Y, Z, etc. 


3.1 Tree automata 


Definition 10. A »’— algebra A is a pair consisting of a nonempty set A 
(of elements of A) and a mapping that assigns to every operator 0 © XS) an 
m—ary operation 

GAA” A, 


where m is the arity of o. The operation oA 


A. 


is called the realization of o in 


Definition 11. A homomorphism from a Si-algebra A to a S/-algebra B is a 
mapping p: A— B such that for all m > 0,0 € Sim, and ay,...,dam € A, 


o4(a1,..-,0m)y = o(ay,..., amy): 
and then write p: A— B. 


Definition 12. The set Ty(X) of 3.X-terms in X, or X-terms for short, 
is defined as follows: 


1, X CTs(X); 

2. o(ty,.-..,tm) € Ty(X) whenever m> 0,0 € Yim and ty,...,tm € Ty(X); 

3. every 1X -term can be obtained by applying the rules (1) and (2) a finite 
number of times. 


Example 6. Let 3 = Xo U 3, UX be a ranked alphabet, where Xo = {y}, 
S = {w} and Xy = {o}. The frontier alphabet X = {x,y}. Then t = 
w(o(y,a(y,x))) is a 3'X-tree. 


Tree automata (or tree recognizers) define forests. There are four basic 
types of these recognizers. A frontier-to-root tree recognizer (or F-recognizer) 
reads its input trees from the frontier towards the root. A root-to-frontier tree 
recognizer (or R-recognizer) reads the trees starting at the root towards the 
frontier. 


Definition 13. A frontier-to-root 3X -recognizer A consists of 


1. a finite X'-algebra A = (A,X); 
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2. an initial assignment a: X > A; 
3. a set A’ CA of final states. 


We can write A = (A,a, A’) or A = (A, 2, X,a, A’). The forest recognized 
by A is the 3)X -forest 


T(A) = {t € Ts(X)|t4(a) € A’}. 


Definition 14. A nondeterministic frontier-to-root 5) X -recognizer A consists 
of 


1. a finite ND -algebra A = (A, X); 
2. an initial assignment a: X > PA; 
3. a set A’ C A of final states. 


we write A = (A,a,A’) or A = (A,X, X,a, A’). The forest recognized by A 
is the 3) X -forest 
T(A) = {t €Ts(X)|tan A’ 40} 


Definition 15. A nondeterministic root-to-frontier (NDR) 3X -recognizer A 
consists of 


1. a finite NDR X/-algebra A = (A, ); 
2. a set A' C A of initial states; 
3. a final assignment a: X — PA. 


we write A = (A,A’,a) or A = (A,X, X, A’,a). The forest recognized by A 
is the 3/X -forest 
T(A) = {t € Ts (X)|tan A’ ZO}. 


A deterministic root-to-frontier ’X-recognizer (DR) is an NDR SX- 
recognizer A = (A, A’, a) such that A’ and all of the sets o4(a) (6 € Ym,m > 
1,a € A) and o4 with o € Xp contain exactly one element. 

Deterministic and nondeterministic frontier-to-root 2’ X-recognizers and 
nondeterministic root-to-frontier ©’X-recognizers are of the same power, the 
forests by these three types of recognizers are exactly the recognizable forests. 
But the deterministic root-to-frontier recognizers are weaker and they define 
a proper subfamily of Rec. 

For a recognizable forest, some operations can be defined on it. The most 
common operations are: 


1. forest product: T(a — T,|a € X) = U(t(a — T;,|a € X)|t € T). Where T 
is a ’X-forest and (T,|" € X) is an X-indexed family of 1’X-forests; 
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2. z-product: S-,T = T(x — T,|" € X), where S and T are 3X-forests, 
T,=S, T, =axforallre X, «#2; 

3. x-quotient: S~*T = {p € Ty(X)|S -z {p} NT F O}, where S and T are 
+) X-forests; 

4. o-product: o(T1,...,Tm) = {o(ti,...,tm)|ti © T1,..-,tm € Tm}, where 
o € SX’ is an m-ary operator and T),..., Tm are m 53/X-forests; 


3.2 Tree transducers 


Tree transformation is defined as a binary relation t C Ty(X) x Te(Y), in 
which T(X) and T9(Y) are the set of trees as defined in the previous section. 
An inclusion (p,q) € 7 means that 7 transforms p into g. Denote a countably 
infinite set of auxiliary variables by = = {&, €,...}. Its role is to indicate an 
occurrence of a subtree in a tree. 

There are two main kinds of tree transducers, named frontier-to-root tree 
transducers which process a tree from the leaves to the root and _ root-to- 
frontier tree transducers which work in the opposite direction. 


Definition 16. A frontier-to-root tree transducer (F-transducer) is a system 
A= (1), X,A,2,Y, P, A’), where ©’, 2 are ranked alphabet, X,Y are the fron- 
tier alphabet, A is a ranked alphabet consisting of unary operators (the state 
set of A), A’ C A is the set of final states and P is a finite set of productions 
of the following two types: 


1.2 a(q) (te X,a€ Age TH(Y)); 
2. o(a1(&1),---,;@m(Em)) > a(q(é1,---,&m)) (9 € Yinsm > 0,a1,..-,4m, 
ae A, q(&1,. ui ,bm) € Ta(Y U Bn) )- 


Given an F-transducer A = (17, X, A, 22, Y, P, A’), the transformation in- 
duced by A is the relation 


Ta = {(p,q)|p € Ts(X),¢ € To(Y), ag € pr for some a € A’} (3) 


For every p € Ty[X U A&],pr% is the subset of ATp(Y U 5) given as 
follows: 


l.ifp=a€ (a@€ A,é€&), then af € pr’; 

2. ifp © X UX, then aq € pr’ for all (p, aq) € P, 

3. ifp=a(pi,...,Pm) (7 € Ym,m > 0) then aq(q,...,¢m) € pr’ for all 
(a(a4,-..,@m),09) € P and a;q; € p;74(a,a; € Aji =1,...,™m); 

4. nothing is in any pr unless this follows from (1)-(3). 
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Example 7. Let A = (2, {x}, {ao, a1}, 2, {y}, P, {ao}), where VY = Yo = 
{o},2 = 2; = {w} and P consists of the productions « — ayy and 
a(a1, 01) > agw(&1). 

Consider the tree o(, x). One of the possible derivations 


o(x,) > o(ary, x) > o(ary, ary) > aow(y) 


is illustrated as below. 


nevies: 


Fig. 2. A transformation by an F-transducer 


Definition 17. A root-to-frontier tree transducer (R-transducer) is a system 
A= (1), X,A,2,Y,P, A’), where ©, X,A,2,Y and A’ are specified the same 
way as F'-transducer, A’ is the set of initial states, P is a finite set of produc- 
tions of the following two types: 


laz—oq (ae A,ze X,q € Toly)); 
2. ‘ied tanemewe se a (a A,o Xim,m > 0,q € TolY U AE )). 


Given an R-transducer A = (27, X,A,2,Y,P, A’). Then transformation 
induced by A is the relation 


Ta = {(p, 9 |p € To(X),¢€Tal(Y),¢€ ptaa for someace A} (A) 


For any a € Aand p € Ty(X), pt4q is defined as follows: 


1. ifp € Xo UX and (ap,q) € P then q € ptaga; 

2. ifp=o(p1,.--;Pm) (o € Xim,m > 0), then for any (ao, q(ai€q",.-. , am&?”")) 
€ P and G, € DiTAai, (l<i<m,1< 7 < ni),¢(qi,---,4m) € 
PTA,a Where Qi = (Gi.,---+Gin,) @=1,---,m); 

3. nothing is in any pr,,_ unless this follows from (1)-(2). 


Example 8. Let A = (2, {x}, {ao, a1, a2}, 2, {y1, yo}, P, {ao0}) be an R-transducer, 
where 1) = a = {o}, a 22, U Qo, 22; _ {uw }, 2 = {we} and P 
consists of the productions ag9q — w(aié1, 4261), aig — wi(aiéi), aaa > 
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w1(d2&1), 412 — Yi, 2% — Yo. Consider the trees p = o(a(a(x))) and 
q = we(wi(wi(y1)), w1(wi(y2))). Then a derivation of g from app is shown 
in Figure 3. 


Q, QO, Qa, 


Fig. 3. A transformation by an R-transducer 


3.3 Some classes of tree transformations 


Let A= (2, X,A,2,Y, P, A’) be an F-transducer. Then: 


1. A production of A is linear if each auxiliary variable occurs at most once 
in it. Moreover, A is a linear F-transducer (LF-transducer) if all of its 
productions are linear. 

2. Aisa totally defined F-transducer (TF-transducer) if 
(i) for each « € X there is a production in P with left-hand side x and 
(ii) for all m > 0,0 € XY, and aj,...,@, € A there is a production in P 

with left-hand side o(a1,...,@m). 

3. Aisa nondeleting F-transducer (NF-transducer) if for every production 
o (@1,---;4m) > ag, (o € Xim,m > 0) from P each €; € =, occurs at 
least once in q. 

4. A is a deterministic F-transducer (DF-transducer) if there are no two 
distinct productions in P with the same left-hand side. 

5. Ais an F-relabeling if each of its productions is of the form 
(i) «> ay (tw E X,a€ Ayy EY) or 
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(ii) o(a1,---,@m) — aw(&1,...,&m), where o@ € Sin, a1,...,Am,a € 
A,w € Qm. 


Example 9. Let A = (7, {x}, {ao, a1}, 2, {y}, P, {a1}) be the F-transducer 
with 3 = 5 = {o} and 2 = 922 = {w}, where P consists of the productions 
Z — agy, o(a0,40) — aiw(£1,€2), o(a0,a1) — aow(£1,€2), o(a1,a0) — 
ayw(€1,€2), o(a1, a1) > aiw(£1, £2). 

Then A is a linear, totally defined, nondeleting, and deterministic F- 
transducer. A is also an F-relabeling. 
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Summary. This paper presents an overview of General Natural Syntax (GNS), a 
formal theory of general explicative power that generalizes and formalizes syntactic 
concepts in order to offer a general notion of syntax that is independent of any 
particular language. 


1 General Natural Syntax: Basic Idea 


General Natural Syntax (GNS) is a formal theory of general explicative power 
that intends to formalize a general notion of syntax that is independent of 
any particular language. The idea behind this formalism is that it reduces 
syntactic constructions to a few principles related to their semantic functions, 
but which can be defined independently of semantics. 

GNS generalizes and formalizes syntactic concepts so that they can be in- 
corporated into a formal theory of general explicative power. As we have said, 
the general principles we proposed as elements of GNS are related to their 


56 V. Manca, M.D. Jiménez-Lépez 


semantic function but definable independently of semantics. GNS represen- 
tations stand between the concrete syntax of specific natural languages and 
the semantic function of syntactic phenomena. In this regard, GNS can be re- 
lated to the so-called abstract syntax of programming languages [4]. Abstract 
syntax is an important tool in the translation process between programming 
languages: it provides a deep description of syntactic constructs and is in- 
dependent of any particular syntactic encoding. The starting point for this 
approach was an algebraic notion of syntax, more abstract than a concrete 
syntax, placed at an intermediate level as a bridge between syntax and seman- 
tics. Symbols of an abstract syntax expression denote semantic functions, but 
at the same time, all the information is included so that the concrete syntax 
of the expression can be generated. 

GNS is also related to the notion of tagged text, which is the basis of many 
markup languages (TeX, HTML, XML). Also here, abstract expressions are 
used that mix the pure textual information with information about text. for- 
mat and visualization (fonts, dimensions, paragraphs, etc.). So, the structure 
of a text is represented in a way that in many aspects is independent from 
the way a text is realized by a compiler or a browser. Marked texts abstractly 
express properties and relations of the textual units at a logical level. 

Considering all the above ideas and taking into account that the use of 
algebraic formalization has a strong and deep tradition in natural language 
analysis within the fields of semantics [15, 14] and morphology [13], what 
we propose here is to develop an analogous logic-algebraic approach at an 
intermediate level between semantics and morphology: the level of the abstract 
or general syntaz. 


2 Formal Prerequisites 


In this section we provide the formal prerequisites that are needed to under- 
stand the formalization presented in this paper. For further information on 
the theory of formal languages and mathematical logic see [17, 16, 2]. 

A nonempty set V of symbols or letters is called an alphabet. A word or 
a string over an alphabet V is an element of the free monoid V* generated 
by the symbols of V under a binary associative operation of concatenation 
(denoted by juxtaposition). The empty word X is the neuter element with 
respect to concatenation (cA = Ax = x). The length of a string x € V* (the 
number of symbol occurrences in x) is denoted by |]. 
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We use symbols —, 7, A, V, <, V, 5, =, = with the standard syntactic and 
semantic first order logic meaning. We assume that the reader is familiar with 
logical notions such as variables, constants, predicates, functors, first order 
formulae and terms, and free and bound variables. A set of formulae, called 
azioms, and all their logical consequences constitute a theory (with respect to 
some notion of logical consequence or of logical deduction). 

A model M, or a (relational) structure, is given by: i) a set D, called 
domain of M, ii) some elements a,b,... € D, called individual constants of 
M, and iii) some operations f,g,... and relations R,Q,... over D (an arity 
is associated with each operation and relation and it specifies its number of 
arguments). Usually M is indicated by: 


MS DG Opens Fy Qeenvg tt ldyaes) 


The set Term(M) of the terms over M is given by all the expressions that 
can be constructed, in the usual algebraic sense, by applying the operations 
of M to the individual constants of M. For example, if f has arity 1 and g 
has arity 2, then the following are terms over the model given above: 


f(a), g(a,b), F(g(a,6)), g(a, f(a), Flg(a, Fla); 


An equation such as g(a, f(a)) = b means that by applying the operation 
f to the constant a we get an element of D, say c, and by applying g to 
the pair a and c we get b. This means that g(a, f(a)) is considered as the 
denotation of the element of D obtained by applying the operations following 
the way algebraic expressions are usually evaluated (in the order specified 
by parentheses). However, we can consider the term as denoting itself, i.e. a 
sequence of symbols for individual constants, operations, commas, and paren- 
theses, disregarding any meaning of symbols. It is important to distinguish 
between these two aspects. If we want to be precise we write g(a, f(a)) to 
refer to the element of D (if it exists) denoted by the term, while in the other 
case we write [g(a, f(a))]. However, in practice the context will indicate in 
which sense a term is used. 

A string model or a monoidal model M is a model that has: i) a domain 
that includes a free monoid V* over an alphabet V, ii) constants that include 
a constant \ for an empty string and for symbols of V, iii) operations that 
include a binary operation that on V* coincides with the string concatenation 
on V* (indicated by juxtaposition). 
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A signature ¥) is essentially a set of symbols for denoting objects (indi- 
vidual constants), operations and relations of a relational structure (relations 
can be identified as operations that provide Boolean values as results). In 3’, 
any symbol of operation or relation is equipped with an arity that specifies 
the number of arguments of the corresponding operation or relation. 

A string theory or a monoidal theory T, over an alphabet A and a signature 
5’, is a first order logic theory of axioms @ over the signature »’ such that all 
the symbols of A and the empty string A are individual constants of 3’. The 
concatenation operation of monoids is denoted in »’, and @ also includes the 
monoid axiom: Vz, y, z(a(yz) = (ry)z \ cA = « A Ax = 2). In other words, 
a monoidal theory over an alphabet A and a signature %’ is a theory where 
terms include the strings over A. This means that concatenation is applied not 
only to the symbols of A, but to the terms that can be built on the signature 
3}. This possibility provides syntactic constructs where not only symbols of 
an alphabet, but even terms, are concatenated. 

A monoidal system or simply a monoidal indicates, generically, a string 
model or a string theory. Monoidals are good environments for defining the 
syntax of formal languages [11]. 

Let us indicate by GNS(L) the terms, with their syntactic category, that 
can be constructed starting from the lexical items of L in a suitable string the- 
ory that will be defined in the course of the paper. The alphabet of this theory 
is the set of usual Latin letters plus other special symbols that will appear 
in the axioms given in the next sections. Variables are indicated by letters 
L,Y, Z,uU,v,... which may have apices or subscripts. Individual constants are 
indicated by special strings of capital letters. Operations and predicates are 
indicated by strings starting with a backslash (\). For a better reading, given 
a predicate P we write t: P (t is of type P), instead of P(t), in fact, t : P 
intuitively means that term t belongs to the syntactic category P. 


3 General Natural Syntax: the Formalism 


GNS has been defined as a general formal framework given by some axioms 
according to which some operations are applied to strings of some categories 
and get other strings (of some specified categories). The formalism consists 
of: 


e Eight basic categories; 
e Thirty syntactic operations; 
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e Forty axioms; 
e A few hundred grammemes; 
e A few thousand lexical items (basic lexicon). 


In what follows we will present the above elements in order to provide an 
overview of GNS. 


3.1 Three basic operations 


In order to construct GNS, three basic operations have been assumed as the 
preliminary data of the analysis: 


1. Conjugation (\conjug): adds temporal and dynamic parameters to syn- 
tactic elements that can play verbal roles. 

2. Determination (\determ): adds spatial and contextual information to el- 
ements that can play nominal roles. 

3. Predication (\pred): is the basic sentence building construction. 


We assume that we know when these operations can be applied to some 


arguments. 


3.2 Eight basic categories 


Taking into account the three basic operations introduced in the above section 
we consider the following eight basic categories: 


1. Verb (Verb): We define the class Verb of elements x such that, for some 
conjugative y, \conjug(x, y) provides a result. Formally: 


Verb = {x | dy z \conjug(2,y) =z, y € Conjugative} 


2. Noun (Noun): We define the class Noun of elements x, such that, for 
some determinative y, \determ(x, y) provides a result. Formally: 


Noun = {x | dy z \determ(x,y) =z, y € Determinative} 


3. Substantive (Subst): A substantive is the result of the determination 
operation. Formally: 


Vayz(\determ(x,y) =z A y: Determinative > z: Subst) 
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4. Verbative (Verbt): A verbative is the result of the conjugation operation. 
Formally: 


Veyz(\conjug(x,y) =z A y: Conjugative — z: Verbt) 


5. Proposition (Prop). A proposition is the result of a \pred operation 
that takes as arguments a substantive (Subst) and a verbative (Verbt) 
and provides a proposition as a result. Formally: 


Prop ={z|iazy \pred(x,y) =z, 2 © Subst , y € Verbt} 


We write x : Cat to state that x is an expression of category C'at. We use 
Cat to indicate any of the previous categories. To those five basic categories 
we need to add the following three Ad categories (AdCat indicates any of 
them): 


6. AdProp 
7. AdNoun 
8. AdVerb 


It is important to note here that although the terms proposition, noun, 
verb, and substantive are taken from traditional grammatical, logical, and 
semantical analysis, the definition we provide of our basic types is completely 
formal, based on the assumption of some initial operations. 


3.3 Grammemes 


The eight basic categories above are those of full linguistic elements. To them 
we add a category of empty linguistic elements that we have called gram- 
memes®. Grammemes determine the surface syntactic realization of the ab- 
stract syntax operations considered in GNS. 

Syntactic operations defined in GNS allow us to construct complex lin- 
guistic expressions with one of the eithg basic categories described above. 
However, many syntactic operations need some additional parameters, which 
we have called grammemes. Grammemes are defined as elements that individ- 
uate the features required to evaluate terms of GNS(L) into expressions of L: 
that is, kinds of information that the grammar of L requires to provide the 


3 Note that we are not using ‘grammeme’ here with its usual meaning in linguistic 
morphology. 
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morphological realization corresponding to the abstract syntactic representa- 
tion. 

It is important to distinguish grammemes from particles. Particles are 
strings that are provided in the surface syntactic realization of the abstract 
syntax operations. Particles are strings that are inserted in a text as a con- 
sequence of applying a syntactic operation with some grammeme. Examples 
of grammemes are: singular (SING), plural (PLUR), present (PRES), present- 
progressive (PPG), past (PAST), disjunctive (OR), definite (DEF), agent (AGENT), 
and first, second, third (I, II, III). The difference between grammemes and 
particles can be observed in the following English example: \conjug(go, III, 
SING, PPG, PAST)= was going. 

Grammemes are classified into several types. Although these types depend 
on the particular language we believe that they vary very little even within 
wide classes of languages. What is essential in the notion of grammemes is 
that they are a “closed set”, which is a fixed and small number of elements not. 
exceeding a few hundred units (apart from numeral and ordinal elements). 
Classes of grammemes in English, Italian, Spanish, and many other languages 
are listed in Table 1. 

Note that some of the above classes consist of a sequence of grammemes, 
which depends on the language we are considering. For example, the class con- 
jugative consists of a sequence which, depending on the language, may include 
types such as personal, number, gender, tense, modal-attitudinal, aspect... 


3.4 Operations: Generalities 


The syntactic operations we defined in GNS are related to fundamental se- 
mantic roles, but only the categories of their arguments and the categories of 
their results are essential in their determination. In this regard, a general syn- 
tactic construction is a kind of bridge between syntax and semantics. It does 
not deal with the particular morphological features of the final linguistic form 
of the expression that is the result of applying the syntactic operation, but at 
the same time, it does not concern the meaning of the final expressions and 
the way this meaning is related to those of the arguments of the operation. 
Therefore, a syntactic operation is defined by: 


i) its name, 
ii) the categories of its arguments, and 
iii) the category of the resulting expression. 
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CLASSES OF GRAMMEMES 
1. Conjugative: 


- Personal 

- Number 

- Tense 

- Mood 

- Gender 

- Aspect 

- Modal-Attitudinal 
. Determinative 
. Copulative 
. Circumstantial 
. Complementative 
. Locative 
. Cardinal 
. Ordinal 
. Deictic 
10. Quantifier 
11. Intensive 
12. Comparative 
13. Coordinative 
14. Subordinative 


CANO WN 


Table 1. Classes of Grammemes 


The syntactic operations defined can be classified as constructs of one of 
three general schemata: 


1. Combination Schema puts together strings of different categories and pro- 
vides a string of a category other than the categories of its components. 

2. Expansion Schema takes as its argument a string of a given category and 
provides a bigger string of the same category. 

3. Transcategorization Schema transforms a string of a given category into 
a string of another category. 


Syntactic operations defined in GNS are listed in Table 2. 


3.5 Axioms 


GN‘ is given by some azioms according to which operations are applied to 
strings of some categories and produce other strings (of some specified cat- 
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N. OPERATION NOTATION 
1 Conjugation \ conjug 

2 Determination \ determ 

3 Predication \ pred 

4 Copulation \ copul 

5 Modification \ modif 

6 Apposition \ appos 

7 Complementation \ complem 
8 Localization \ loc 

9 Coordination \ coord 

10 Subordination \ subord 
11 Negation \ neg 

12 Passivation \ passiv 
13 Adjectivation \ adj 

14 Adverbialization \ adverb 
15 Adproposition \ adprop 
16 Relativization \ relat 

17 Anacoluthon \ anacoluth 
18 Numeralization \ num 

19 Ordinalization \ ord 

20 Deictification \ deixis 

21 Anaphorization \ anaphor 
22 Quantification \ quant 

23 Intensification \ intens 
24 Comparison \ compar 
25 Correlation \ correl 

26 Substantivation \ substantiv 
27 Nominalization \ nomin 
28 Quotation \ quot 

29 Interrogation \ interr 

30 Imperativization \ imper 


Table 2. GNS Operations 


egories). In this section we introduce the forty axioms related to the above 
thirty syntactic operations. Some examples of their functioning are also pro- 
vided. 

We start by providing the first five axioms of our model according to which 
the operations of conjugation, determination and predication are applied. 
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Axiom 1: Conjugation 
Vay2z(\conjug(a,y) =z A y: Conjugative > z: Verbt) 


The type Conjugative refers to the conjugation parameters that in English, 
Spanish and Italian are given by Tense, Mood, Person, Number. Examples of 
conjugative grammemes are: PRES (present), PAST (past), FUT (future), PPG 
(present-progressive), IND (indicative), SUB (subjunctive), PART (participle), 
I (first), II (second), III (third), SING (singular), PLUR (plural), NEC (neces- 
sity), PERF (perfect), IMP (imperfect), MASC (masculine), FEM (feminine), NEUT 
(neuter). 


Axiom 2: Determination 
Vayz(\determ(a,y) =z A y: Determinative > z: Subst). 


The basic determinative grammemes used in determination DEF (defi- 
nite/near) and INDEF (indefinite/far) in many languages need to be accom- 
panied with grammemes of number (SING (singular), PLUR (plural)) and of 
gender (MASC (masculine), FEM (feminine), NEUT (neuter) ). In other languages 
the set of determinatives could be richer and related to an intrinsic mechanism 
of noun classification. 

Predication is the basis of the grammatical schema that provides a sentence 
by combining a ‘subject’ and a ‘predicate’. The following axioms show how to 
apply this operation. 


Axiom 3: Predication 


Va(a: Prop > du,v(u: Subst Av: Verbt A x = \pred(u, v))) 


Axiom 4: Predication 
VYuuwz((u: Subst \u: Verbtt hw: VerbAv #2 Az: Conjugative Av = 
\conjug(w, z) \ Agree(u,v)) > \pred(u,v) : Prop) 
Axiom 5: Predication 
Veuv(a : PropA « = \pred(u,v) — v = PREDICATE(«) Au = SUBJ(a) 


Knowing how conjugation, determination and predication work, we can 
consider the sentence: ‘The dog barks’. It can be formalized as follows (by 
DEF we mean the definite determiner): 
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\pred(\determ(dog, DEF) , \conjug(bark, III, SING, PRES)) 


The deep linguistic level of abstract syntax representations is easily un- 
derstood if we consider the Italian and Spanish translations of the above 
sentence: ‘Il cane abbaia’ and ‘El perro ladra’, respectively, whose abstract 
syntax representations are the following: 


\pred(\determ(cane, DEF), \conjug(abbaiare, III, SING, PRES)) 
\pred(\determ(perro, DEF), \conjug (ladrar, III, SING, PRES)) 


where we can see that these two representations can be obtained from the 
English one by replacing the lexical items dog and bark with the corresponding 
lexical items in Italian and Spanish. 


Copulation is an operation that transforms a Noun and AdNoun, or a 
Subst into a Verb. Axiom 6 provides the requirements to apply this operation. 


Axiom 6: Copulation 


Vey((x: Noun Va: AdNoun V x: Subst) \ y : Copulative > \copul(a, y) : 
Verb) 


Copulative grammemes are BE, BECOME, SEEM... Examples of copulative 
constructions are the following: 


e ‘was a lawyer’: \conj(III, SING, PAST (\copul(lawyer, BE)) 
e ‘became president’: \conj(III, SING, PAST (\copul(president, BECOME)) 


Modification (\modif) is the combination of a phrase of category AdCat 
with a phrase of a category Cat. Modification introduces the notions of kernel 
and modifier. Axioms 7 and 8 formalize this operation: 


Axiom 7: Modification 


Vaey(a: Cat Ay: AdCat \ Agree(x,y) > \modif (x,y) : Cat) 


Axiom 8: Modification 


Va(duu:u: Catv: AdCat \ x = \modif(u,v) ~ KER(2) = 
uA MODIFIER(s) = v) 


The following examples show how modification works in GNS: 
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e ‘young artist’: \modif (artist, young) 
e ‘to walk slowly’: \modif (walk, \adverb(slow)) 


Apposition (\appos) is an expansion of a Subst with a Noun or an 
AdNoun. The noun or AdNoun does not add any further element into the de- 
termination of the substantive, but it gives only additional descriptive aspects 
of what is identified by the substantive. Axiom 9 accounts for this operation. 


Axiom 9: Apposition 

Vey(x: Subst Ay: Noun V y: AdNoun — \appos(x, y) : Subst) 
An example of apposition is the following: 
e ‘Rome, the capital of Italy’: \appos(Rome, capital of Italy) 


Two basic axioms that refer to agreement features are the following. Note 
that axiom 10 axiomatizes the requirements for the relation Agree that guar- 
antees the presence of some common features between the subject and the 
predicate of a predication. Axiom 11 concerns some commutativity require- 
ments between the kernel of a modification and the agreement features of a 
verb or a substantive. We indicate by Feature a typical feature (e.g. gender, 
number, person, ...) 


Axiom 10: Agreement 


Vayz(\pred(x,y) = 2 > (a: Subst Ay: Verb Agree(a, y))) 


Axiom 11: Agreement 


Vaey(a2: Cat Ny: AdCat > Feature(x) = Feature(\modif (x, y))) 


Axioms 12, 13, 14 and 15 show how complementation is applied. Comple- 
mentation (\complem) is an operation in which a Noun or a Verb is expanded 
either with a Subst or with a Noun. This means that we can identify four 
possible types of complementation: 


— 


a verb with a substantive); 
a noun with a substantive); 
a verb with a noun); 
a noun with a noun). 


1. \complemys 
2. \complemns 
3. \complemy nN 
4. \complemyn 


oo 
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These four types of complementation are formalized in the following axioms: 


Axiom 12: Complementation 


Vayz(a:VerbAy: Subst A z: Complementative > \complemy s(a, z,y) : 
Verb) 


Axiom 13: Complementation 


Vayz(a: NounA y: Subst A z: Complementative — \complemy s(x, z,y) : 
Noun) 


Axiom 14: Complementation 


Vayz(a:VerbAy: Noun z: Complementative — \complemy n (a, z,y) : 
Verb) 


Axiom 15: Complementation 


Vayz(a: NounAy: Noun dA z: Complementative — \complemnwn(, z,y) : 
Noun) 


Complementative are grammemes that identify the roles of complemen- 
tations. They correspond to the cases typical of many languages (e.g. Latin, 
Greek, Russian, ...) and can be represented with specific grammemes related 
to a few main functionalities: AGT (agent), INST (instrument); GOAL (goal), 
MANN (manner), MATT (matter), OWN (owner), REF (referent), REC (receiver), 
UNI (union), CONT (content), PURP (purpose), and CAUS (cause). 

Examples of complementation are the following: 


e ‘Jane eats the cracker’: \complemyg(eats, the cracker). 


e ‘cup of tea’: \complemy n(cup, CONT, tea). 


Locative phrases are a special type of modifiers used to indicate a spatio- 
temporal localization. Axiom 16 accounts for this syntactic phenomenon which 
we call localization (\loc): 


Axiom 16: Localization 


Vay(a : Subst Ay: Locative — \loc(x, y) : AdCat) 
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Locative grammemes can be IN, OUT, TO, FROM, AROUND, ACROSS, 
BETWEEN, NEAR, OVER, TOWARDS, BEFORE, AFTER, etc. 
An example of a locative phrase is: 


e ‘(Jane) found the turtle under the table’: \modif(\complemy gs (found, the 
turtle), \loc(the table, UNDER)) 


Coordination (\coord) allows us to connect two (or more) categories of the 
same type by providing a category of that type. If two elements are of the 
same category, then they can be joined together and the resulting unit is of 
the same type. Any category can be coordinated (by means of a coordinative 
grammeme: AND (addition), OR (alternation), BUT (variation)) with a category 
of the same type and the result will be an item of the same category. Axiom 
17 formalizes this idea. 


Axiom 17: Coordination 
Vayz(a% : Cat Ay: Cat A z: Coordinative — \coord(azy) : Cat) 


Consider the sentence ‘We saw many students of chemistry and doctors 
of medicine’, where the elements in italics are coordinated. In terms of GNS, 
coordinated categories can be formalized as follows: 


\coord((\complemy n(students, chemistry)), AND, (\complemy n (doctors, 
medicine) )) 


Subordination connects elements that have a different grammatical ‘sta- 
tus’, one of which is subordinate to or dependent on the other. Subordina- 
tive grammemes are: BECAUSE (reason), IF (condition), WHERE (place), WHEN, 
AFTER, BEFORE (time), WHILE, UNTIL (duration), THOUGH (concession), FOR, 
TO (purpose). Axioms 18 and 19 give a formal account in GNS of this opera- 
tion. 


Axiom 18: Subordination 


Vayz(a: Cat Ay: Cat Az: Subordinative — \subord(xzy) : Cat) 


Axiom 19: Subordination 


Vayz(a: Cat Ay: Cat A z: Subordinative — \subord(xzy) : Cat Aa = 
Ker(\subord(xzy))) 
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Examples of the subordination of various classes of categories are: ‘poor but 
happy’, ‘late though not too late’, ‘enemy for joke’, ‘going to see’... The reader 
can easily check how to formalize these examples by using the above axiom. 


Negation (\neg) is a syntactic operation that can be applied to any Cat 
and the result will be an item of the same Cat. Axiom 20 formalizes this 
operation. 


Axiom 20: Negation 
Va(a : Cat — \neg(x) : Cat) 
Examples of the application of this operation in the GNS framework are: 


e ‘(I do) not agree’: \neg(agree) 
e ‘(It is) not soon’: \neg(soon) 


Passivation (\passiv) is an operation that like negation does not change 
the syntactic category of the element it is applied to. It is based on the fact 
that the order of the subject and object can be reversed in a predication with 
transitive verbs. In this case, if P is a proposition to which passivation can be 
applied, then \passiv(P) is its passive form. The actual means to calculate 
the value of \passiv(P) is a matter of concrete syntax. Formally: 


Axiom 21: Passivation 
Va(a: Prop > \passiv(«x) : Prop) 


Axioms 22, 23 and 24 formalize cases of transcategorization. Adcategoriza- 
tion stands for the operations of adjectivation, adverbialization and adpropo- 
sition considered in the following axioms. 


Adjectivation (\adj) takes as its input a Noun, a Subst, a Verb or a Verbt 
and provides an AdNoun as shown in axiom 22. 


Axiom 22: Adjectivation 


Vay((a: Noun V x: Subst Vx: VerbV x: Verbt) A \adj(x) : AdNoun) 


Adverbialization (\adverb) takes as an input an AdNoun and provides an 
AdV erb. 
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Axiom 23: Adverbialization 


Va(a : AdNoun —> \adverb(x) : AdVerb) 


And finally, adproposition (\adprop) yields an adproposition element from 
either an AdNoun or a Subst. 


Axiom 24: Adproposition 
Va(a: AdNounV x: Subst > \adprop(x) : AdProp) 
Examples of the above operations are the following: 


e ‘milk cup’: \modif (cup, \adj(milk)) 
e ‘slowly’: \adverb(slow) 
e ‘This way, (Doris feeds her guppies)’: \adprop(this way) 


Relativization (\relat) is the syntactic operation that transforms a Prop 
into a Noun. In order to describe this construction, we assume a special 
grammeme REL that has the category of Subst (Actually, a finite number 
of different relative grammemes REL1, REL2, REL3, ..., could be necessary in 
certain cases). If proposition P is ‘REL was on the table’. In this case, \relat(P) 
is a noun, so the following syntactic term: 


‘I am looking for the \modif (pen, \adj(\relat (P))’ 


represents the statement: ‘J am looking for the pen that was on the table.’ 
Axiom 25 formalizes relativization, where Prop(REL) are the propositions 
constructed by the special substantive REL. 


Axiom 25: Relativization 


Va(a : Prop(REL) — \relat(x) : Noun) 


A syntactic operation formally described by the element of Prop (REL) is 
the one that underlies the so-called phenomenon of anacoluthon, a construc- 
tion present in ancient languages and in colloquial forms of modern languages. 
We introduce the operation of anacoluthon (\anacoluth) defined in axiom 26. 


Axiom 26: Anacoluthon 


Va(a : Prop(REL) > \anacoluth(x) : Verb) 
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An example of anacoluthon formalized in GNS terms is the following: 


‘John, I told him to go away.’: \pred(John, \conjug(\anacoluth(I told REL 
to go away), III, SING)) 


Axioms 27 and 28 formalize the operations of numeralization and ordinal- 
ization, respectively. Numeralization (\num) takes as its arguments a cardinal 
number (two, three, ten, two hundred, three thousand, a dozen, hundreds) and 
a noun and provides a noun. However, this operation is only defined for some 
nouns, which are called countable nouns. Ordinalization (\ord) takes as its 
arguments an ordinal number (first, second, last, next...) and a noun and 
provides a noun. 


Axiom 27: Numeralization 


Vay(a: Noun A y: Cardinal > \num(a, y) : Noun) 


Axiom 28: Ordinalization 
Vaey(a: Noun Ay: Ordinal > \ord(ax,y) : Noun) 
The following two phrases are examples of these operations: 
e ‘Three friends’: \num(friend, 3)) 
e ‘The first friend’: \determ(\ord(friend, 1), DEF) 


Deictification (\deixis) is an operation that takes as its arguments some 
grammemes (conjugative, determinative: PRES, PAST, I, II, III, 
DEF/INDEF, NEAR, FAR) and provides a Subst. Axiom 29 formalizes this idea. 


Axiom 29: Deictification 
Va(a : Deictic > \deixis(x) : Subst) 
English examples of deictification are the following: 


e \deixis(PRES) = ‘Now’ 
e \deixis(I, SING) = ‘T’ 
e \deixis(NEAR) = ‘Here’ 
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Anaphorization (\anaphor) is a construct that assigns a label to syntactic 
terms (pronoun) which can be used instead of the labeled term itself. Notice 
that the label (pronoun or whatever) is something that concerns the concrete 
syntax and not the abstract syntax we are defining here. Taking this into 
account we define this operation in axiom 30. 


Axiom 30: Anaphorization 
Va(a : Cat — \anaphor(«) : Cat) 
An example: 


e ‘On the table there is a pen, it is red’: \anaphor(a pen) = ‘it’ 


Quantification (\quant) can be considered as a special case of determi- 
nation. However, it presents some subtle logical and semantic features that 
mean that it has to be identified separately from determination. First of all, 
quantification can only be applied to countable nouns or their plurals and 
provides as its result a substantive with a ‘collective and distributive’ nature. 
It is formalized in axiom 31. 


Axiom 31: Quantification 


Vay(x : Quantifier \ y : Noun — Countable — \quant(a, y) : Subst) 


Vay(x : Quantifier \ y: Subst — Plural > \quant(ax, y) : Subst) 


Quantifiers are grammemes that express the functionalities of words that 
classical grammars classify as distributive pronouns: EVERY, EACH, ALL, 
ANY, FEW, SOME, MOST, MANY, etc. (Some of them are very often referred 
to as ‘partitive’ pronouns). 


Intensification (\intens) consists of the use of grammemes that express 
intensity or non-exact quantity with uncountable nouns. Examples of inten- 
sive grammemes are MUCH, A LITTLE, FEW, ALMOST, MORE, LESS, VERY, 
WHOLE, PARTIAL, ENOUGH, TOO... Axiom 32 provides a formal account of this 
operation. 
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Vay(a : Noun — Uncountable A y : intensive — \intens(x, y) : Noun) 
Vay(a: AdNoun A y: intensive — \intens(a, y) : AdNoun) 


Comparison (\compar) is related to degree particles such as more-than, 
less-than, so-as. Comparison can be obtained in three ways: 1) with two terms 
of a given category Cat and a comparison grammeme (this is, (<,>,=)); 2) 
with a Verb or an AdNoun, a comparison grammeme, and a Subst; 3) with 
a Verb or an AdNoun, a comparison grammeme, and a Prop. This idea is 
captured in axiom 33. 


Axiom 33: Comparison 


Vay2z(a: Cat Ay: Cat A z: Comparative > \compar(a, y, z) : Cat)A 


Vay2zT(a:T AAT =Verbv T = AdNoun) Ay: Subst A z : Comparative 
— \compar(«,y,z):T)A 


VayzT(x:T A(T = VerbV T = AdNoun) Ay: Prop A z: Comparative 
— \compar(a,y,z) :T) 


Examples of comparison are the following: 


e \compar(beautiful, good, >): ‘more beautiful than good’ 
e \compar(beautiful, Mary, =): ‘as beautiful as Mary’ 
e \compar(good, the original, =): ‘as good as the original’ 


Axioms 34 and 35 account for the operation we have called correlation. 
Correlation (\correl) is found in constructions where two propositions are 
related by means of an intensivity degree of a Verb or an AdNoun that occurs 
in the first proposition (e.g. so-that, such-that). 


Axiom 34: Correlation 


Vpgceuvwy(p: Prop q: Prop\ p= uw A(w: VerbV w: AdNoun) Aw = 
\intens(x, y) — correl(p, w,q) : Prop) 
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Axiom 35: Correlation 


Vpgcuvwzyrst ((p: Prop\ p= uwv A (w:VerbVw: AdNoun)A (q: 
Prop q = zty A (a: VerbV a: AdNoun) A t = \compar(w,r,REL) A d= 
\compar(a,s,REL) — correl(p,t,q,d) : Prop))) 


Examples of those two types of correlation are the following: 


e ‘(This is) so big that you cannot carry it.’: \correl(This is very big, 
\intens(big, VERY), you cannot take it) 

e ‘The quicker you are, the more they appreciate your work’: \correl(you 
are quick, \compar(quick, >, REL), they appreciate your work, 
\compar (appreciate, >, REL). 


Substantivation (\substantiv) and nominalization (\nomin) transcatego- 
rize some categories into Subst (substantives) and Noun (nouns), respectively. 
According to a general principle of language, any linguistic entity can become 
something that can be spoken about; that is, any category can be nominal- 
ized and substantivized. We formalize such general principles with axioms 
36 and 37, which account for substantivation and nominalization (\nomin), 
respectively. 


Axiom 36: Substantivation 


Va(a: PropV «a: NounV «: AdNoun = \substantiv() : Subst) 


Axiom 37: Nominalization 

Va(a: VerbV x: AdNoun — \nomin(x) : Noun) 
Examples of the two operations above are: 
e ‘eating’ \substantiv(\nomin(to eat)) 


e ‘emptiness’. \nomin(empty) 


A particular case of substantivation is quotation (\quot) which transforms 
any sequence of letters into a Subst. This particular case of substantivation 
is formalized in axiom 38. 


Axiom 38: Quotation 


Va(a : Cat — \quot(x) : Subst) 
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Consider the following sentence: 


“He said: ‘Tomorrow I will finish the paper” 


Here a full proposition works as a substantive. So, 


‘Tomorrow I will finish the paper’: \quot (Tomorrow I will finish the paper) 


The last two operations in the GNS framework are interrogation (\interr) 
and imperativation (\imperat). Those general syntactic operations transform 
any descriptive proposition into the corresponding proposition with a commu- 
nication modality which is interrogative, or imperative respectively. Axioms 
39 and 40 account for them. 


Axiom 39: Interrogation 


Va(a: Prop > \interrog(x) : Prop) 


Axiom 40: Imperativation 


Va(a: Prop > \imperat(«) : Prop) 


4 An Example 


In this section we provide an example of how GNS works. First, we analyze the 
following English sentence, and then we show that the formal representation 
of this English sentence —by just translating basic words— can be used to 
generate equivalent Italian and Spanish sentences. 


“The two children ran towards the river with their hands raised till they 
reached the bank” 


The above English sentence is analyzed in the following way (we use a # 
to indicate the line number): 


hand 
to raise 
child 


river 


ot te eee 


to run 
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6. to reach 

7. bank 

8. \conjug(# 2, PART, PAST) = raised 

9. \modif (#1, \adj(# 8)) = hand raised 

10. \determ(# 9, DEF, PLUR) = their hands raised 

11. \num(# 3, 2) = two children 

12. \determ(# 11, DEF) = the two children 

13. \conjug(# 5, III, PLUR, PAST, IMP) = ran 

14. \determ(# 4, DEF) = the river 

15. \loc(# 14, TOWARDS) = towards the river 

16. \complem(# 13, # 15) = ran towards the river 

17. \complem(# 16, MANN, #10) = ran towards the river with their hands 
raised 

18. \pred(# 12, # 17 ) = the two children ran towards the river with their 
hands raised 

19. \conjug(# 6, III, PLUR, PAST, PERF) = reached 

20. \determ(# 7, DEF) = the bank 

21. \complemy s(# 19, # 20) = reached the bank 

22. \pred(\deixis(III, PLUR) , # 21) = they reached the bank 

23. \subord(# 18, UNTIL, # 22) = The two children ran towards the river with 
their hands raised till they reached the bank. 


In the end we have the basic elements and operations shown in Table 3. 


BASIC ELEMENTS OPERATIONS 


child \ subord 
to run \ pred 
river \ determ 
hand \ num 
to raise \ complem 
to arrive \ conjug 
bank \ deixis 

\ loc 

\ modif 


Table 3: Basic Elements and Operations. 


If we now take the above GNS formalization and just translate the so-called 
basic elements into Italian or Spanish in Table 4 below. 
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English Italian Spanish 
child bambino nino 

to run correre correr 
river fiume rio 

hand mano mano 

to raise alzare levantar 
to reach raggiungere alcanzar 
bank riva orilla 


Table 4: Translation of Basic Elements 


we obtain the following two sentences: 


e I due bambini correvano verso il fiume con le mani alzate finché non rag- 
giunsero la riva. 

e Los dos ninos corrieron hacia el rio con las manos levantadas hasta que 
alcanzaron la orilla. 


5 Conclusions 


In this paper we have presented a formal theory of general explicative power: 
General Natural Syntax. The aim behind this framework is to reduce syntac- 
tic constructions to a few principles related to their semantic functions but. 
defined independently of semantics. In this regard, it can be related to the so- 
called abstract syntax of programming languages. In fact, GNS establishes the 
requirements for the construction of terms and the rules for assigning them 
syntactic categories. It does not deal with the final surface syntactic forms 
of concrete syntax. It does not try to cope with the particular morphological 
features of the final linguistic form of the expression resulting from the ap- 
plication of the syntactic operation, but it is interested in a deeper syntactic 
level: GNS aims to define an abstract syntax for natural languages. 

Within GNS, many classical notions can be stated formally, in terms of 
basic categories and basic abstract syntactic operations. GNS has the advan- 
tage of simplicity. With a very small number of simple ingredients, GNS can 
account for general syntactic constructions in any natural language. GNS gen- 
eralizes and formalizes syntactic phenomena present in every natural language 
and incorporates them into a formal theory of general explicative power. Al- 
though GNS is very recent and still requires further research, if its features 
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are taken into account we feel that it may be suitable in the fields of natural 
language processing, machine translation and linguistics. 


Our future work will focus on two main aspects: 1) basic lexicon and gram- 


memes; and 2) formal and computational work. Regarding the former, we are 


interested in establishing a basic lexicon and an exact number of grammemes. 
Regarding the latter, it would be very useful to define an interface to generate 
GNS formula and to develop a system that can generate sentences from the 
GNS formula. Besides those two topics, it would be very interesting to go 
deeper into the possible applications of the framework. 
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Summary. In this paper we compare the well known DES cryptosystem with the 
recently introduced DémGési system, which is based on finite automata. We do a 
time complexity analysis on both algrithms. We show that without making use of 
an auxiliary matrix the DémGsi cryptosystem is slower than DES. However, the use 
of auxiliary matrices makes the former perform better than its well known counter- 
part for some block lengths. 


1 The Data Encryption Standard (DES) 


First let us take a look at the DES cryptosystem. In particular, if we consider the 
following to be elementary steps (es): reading input, comparing two values, jumping, 
the system will have the following requirements: 


1. First we perform the initial permutatincludegraphicion as seen in Figures 1 and 
2. This phase consists of 64 elementary steps. 

2. a) Make a copy of the current half of the 64 bits (1 es) 
b) Extend the current 32 bits to 48 bits (48 es) (Figure 3) 
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58|50/42|34)26/18}10 
60/52 |44)36)28 |20/12 
62/54/46 ]38)30)22/14 
64/56 }48]40)32)24/16 
57/49/41 |33|25|17) 9 
5951 /43)35)27)19)11 
61/53}45]37) 29/21/13 
63/55 ]47)/39)31 |/23/15 


NI] Ot) Go] FE] CO] Gy] BB] pO 


Fig. 3. Expansion function (E) 


) Read the first key (1 es) 
) XOR the 48 bits resulting from 2b with the key read (1 es) 
) do 8 s-box 6+ 4 bit mappings (3x8=24 es) (Figure 4) 
) apply the 32 bit permutation (32 es) (Figure 5) 
g) XOR what we have so far and the remaining 32 bits (Figure 6) 
3. Swap the two sides (1 es) 
4. Perform the inverse permutation (64 es) as seen in Figures 7 and 8 


We can now summarize the calculations on each of the steps. More exactly we 
will get the following: 
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14] 4 |13)1) 2 |15/11] 8 | 3 |10) 6 /12] 5 | 9 |0) 7 
0 15) 7 |4/14} 2 |13) 1 |10) 6 J12/11] 9 | 5 |3) 8 
4) 1 14/8/13] 6 | 2 |11)15)12) 9 | 7 | 3 |10)5] 0 
15)12) 8 j2)4|9)}1 | 7 | 5 j11) 3 |14/10] 0 j6}13 


Fig. 4. Substitution box (S-box) Si 


16) 7 |20/21 
29|12)28)17 


Fig. 5. Permutation (P) 


PROPOSES SOSH SOSH SSOHPSSO SS EOS OOS 


SOCb SESS doSSdESo DOES EdEOdLEdooDdO 


Fig. 6. XOR with the remaining 32 bits 


TOO, 


y Wey 


(4s LEU N ORNS. SY 


LISA LES VV UC AR EAR 
necanonnades2OGo4C ODD SCRACCOOSADUDUCUDAA AAO SACEODD SSNS, 


Fig. 7. Final permutation (IP~*) 
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40 
39 
38 
37 
36 
35 
34 
33 


48}16 5624/64/32 
47|15)55)23/63/31 
46|14|54|22)62/30 
45}13|53/21)61/29 
44}12|52|20)60/28 
43}11]51)19)59/27 
42}10|50|18)/58|26 
41] 9 |49)17|57/25 


Ry) bro] CW] BB] OU | NI] CO 


Fig. 8. Inverse of IP 


lststep : 64es 
ndstep : +48 +1414 24+ 3241) x = es 
2nd 14+484+14+1+4+24+4+ 3241 16 = 1728 
3rdstep : les 
Athstep : 64es 


The number of steps needed to perform the 64-bit DES are given by the table 
in Figure 9. Summing up this table gives us a total of 1857 steps. For a detailed and 
more formal time complexity analysis of the algorithm please refer to [3, 4, 5]. 


Step Operation Time|Equivalent| Notes 
total 

1. [IP 64 bit transposition| 1 64 

2.a 32 bit Copy 16 16 *16 steps 
2.b | 48 bit transposition | 16 48x16 7 

2.¢ READ the key 16 16 sl 

2.d 48 bit XOR 16 16 = 

2.e 6+ 4 bit two 8x16] 3x128 - 


dimensional mapping 
2.f | 32 bit transposition 16 32x16 


* 


2.g 32 bit XOR 16 16 . 
3. 32 bit swapping 1 1 
4. Ip! 1 64 


Fig. 9. Number of steps required by DES 
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2 The Démési System 


Let us now move on to the Démési system and take a look at the number of ele- 
mentary steps needed to process 64 bits with or without using an auxiliary matrix. 


2.1 Without auxiliary matrix 


As we move on to the Démési system first we will take a look at the number of 
elementary steps needed to process 64 bits without using an auxiliary matrix. When 
considering a Démési system without an auxiliary matrix, from a final state we will 
have the following phases to follow: 


1. Read a character 8 x [1], where the number between |] is the number of ele- 
mentary steps; read a character from previously generated random number row 
8 x [1]; in the transition matrix we find the state transition corresponding to the 
random number read 8 x [Psurit], where Psurit is the number of elementary 
steps of the logical and physical correspondence for the transition matrix 

2. This phase is a longer one that depends on the length k of the given code word. 
The expected step count is 8 x k x [2 x Psurit + 4] 

3. The expected cost of the 3rd step is 8 x 2 x Psurit + 4. 

4. Now that the parity is correct, we have to look for the input sign which will take 
the automaton into a final state. Based on the reference we can expect that this 
goes down in 8 x [128 x Psurit + 132] steps, that is with 4 "if"s and with final 
state compression and comparison reachable in 128 steps. 


The phases are explicitly described in the Figure 10: 

These four cases consist altogether of k steps of wandering, parity change and 
state identifying. Together with the cost of the first steps this gives a total of ele- 
mentary steps described by: 


8 x [Psurit x (2k + 131) + 144+k x 4] 


If we suppose c to be the extra cost for every k steps, in other words the upper 
approximation for the cost of the extra steps performed by the algorithm, then 
the formula is equivalent to 8 x [Psurit x (2k + 131) + 144+ k x (4+ c)]. When 
implementing in Windows, this extra cost is large for any algorithm, depending on 
Windows’ event handler and scheduler. However, for the purpose of theoretical time 
complexity analysis we can disregard this cost, as it is insignificant compared to the 
other factors. 

Comparison: In the case of the Démési system, the processing of 8-byte gener- 
ating code words of length k takes 8 x (Psurit x (2k + 131) + 144+k x (44+ c)) 
elementary steps. Furthermore, we can take Psurit to be equal to 3 (as it is consid- 
ered in the Figure 10) and c to be 0. This way the time cost of processing a 8-byte 
DES block becomes: 
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Step Operation Times|Equivalent total Notes 
la Read a character 8 8 
1b Read the next 
random value 8 8 
Ic Mapping 8 8*Psurit 
2a Read the next *probably 
random value 8*k*2 8*k*2 in two steps 
2b Mapping 8*k*2| 8*k*2*Psurit *probably 
in two steps 
2c | Comparison(random *probably 
in non-final state) /8*k*2 8*k*2 in two steps 
3a Read the next *probably 
random value 8*2 8*2 in two steps 
3b Mapping 8*2 8*2*Psurit *probably 
in two steps 
3c |Comparison (random in *probably 
non-final state 
with correct parity) 8*2 8*2 in two steps 
4a Checking 8 8*4 *check the cases 
4b Mapping to the *probably 
right final state 8*128) 8*128*Psurit in 128 steps 
4c Comparison (map *probably 
value, final state) 8*128 8*128 in 128 steps 


Fig. 10. No auxiliary matrix 


8 x (3 x (2k +131) +144 +k x 4)) = 10k +537 


By looking at the ratio 80k + 4296/1857 we see that without the auxiliary matrix 
the algorithm is slower than the DES. 


2.2 With auxiliary matrix 


In this section we compare the DES cryptosystem with the Démési system that uses 
auxiliary matrices. 

With the introduction of the auxiliary matrix the cost becomes minimal in Case 
4, as it can be seen in the table. In the case of the DES this cost is 6. Thus, the 
formula is transformed into Psurit x (3+ 2k) +4k+12+4 6. Considering once more 
Psurit = 3 we get 80k + 168 = 1857. Since the equality holds for k + 21, it follows 
that for block lengths 21 the DémGsi cryptosystem with auxiliary matrix performs at 
same level as the DES cryptosystem. Thus, for blocks of shorter length the Démési 
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Step Operation Times|Equivalent total Notes 
la Read a character 8 8 
1b Read the next 
random value 8 8 
Ic Mapping 8 8*Psurit 
2a Read the next * probably 
random value 8*k*2 8*k*2 in two steps 
2b Mapping 8*k*2| 8*k*2*Psurit *probably 
in two steps 
2c | Comparison(random * probably 
in non-final state) /8*k*2 8*k*2 in two steps 
3a Read the next * probably 
random value 8*2 8*2 in two steps 
3b Mapping 8*2 8*2*Psurit * probably 
in two steps 
3c |Comparison (random| 8*2 8*2 * probably 
in non-final state in two steps 
with correct parity) 
4a Checking 8 8*4 *check the cases 
4b Mapping to the *Exactly 
right final state 8*6 8*6*Psurit in 6 steps 


Fig. 11. Using auxiliary matrices 
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cryptosystem is faster than DES while, naturally, when taking longer blocks it is 
slower. It is still a question, how secure the system remains when assuming these 


block lengths. 
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Summary. Both deterministic and non-deterministic finite state machines (au- 
tomata) recognize regular languages exactly. Now we extend these machines us- 
ing two heads to characterize even-linear and linear languages. The heads move in 
opposite directions in these automata. For even-linear languages, deterministic au- 
tomata have the same efficiency as non-deterministic ones, but for the general case 
(linear languages) only the non-deterministic version is sufficient. We compare our 
automata to other two-head automata as well. 


1 Introduction 


The theory of automata is well developed and applicable in many theoretical 
and practical fields. The class of finite automata (both deterministic and non- 
deterministic) characterizes regular languages. They have many interesting exten- 
sions, for instance, probabilistic, weighted automata etc. 
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Chomsky type grammars and generated language families are some of the most 
basic and important fields in theoretical computer science [1, 10]. 

In some senses, linear languages are more related to regular languages than 
context free ones. For instance, in [11] regular-like expressions are used to describe 
linear languages. There is a language class between linear and regular ones (namely, 
the even-linear languages) which play an important role in learning theory as well 
[12]. 

In this paper we present a class of 2-head finite automata which characterize 
linear context-free languages exactly. We analyse the deterministic versions of this 
class of automata as well. The normal form of these automata and a special class 
characterizing the even-linear languages are also presented. Some relations to other 
two-head automata, for instance to the Watson-Crick automata [9, 8], will also be 
discussed. 


2 Preliminaries 


In this section we recall some well-known concepts of formal language and automata 
theory. 

Let V be a finite non-empty set of symbols (usually called letters). The strings 
built up by letters are called words. The sets of words are the languages over the 
alphabet V. In this paper the sign ¢ refers to the empty word. 

First both the deterministic and non-deterministic finite state machines are re- 
called 


Definition 1. A 5-tuple A = (Q,s,V,6,F) is a finite state machine or finite au- 
tomata, with the finite (non-empty) set of states Q; s € Q is the initial state; V is 
the (input) alphabet and F C Q is the set of final (or accepting) states. The function 
6 is the transition function. There are two extremal possibilities of this functions 
are used. If 5: Q x (VU {e}) — 2°, then the device is the non-deterministic fi- 
nite automaton. If6:Q x V —@Q then the machine is called a deterministic finite 
automaton. 


A word w is accepted by a finite automaton if there is a run starting with s, 
ending in a state in F’ and the symbols of the transitions of the path results w. 
Now we recall some language families related to the Chomsky hierarchy. 


Definition 2. A grammar is a construct G = (N,V,S,H), where N,V are the non- 
terminal and terminal alphabets, with NN V =; they are finite sets. S € N is a 
special symbol, called initial letter. H is a finite set of pairs, where a pair uses to be 
written in the form v > w with v € (NUV)*N(NUV)* and w € (NUV)*. (We 


used the well-known Kleene-star notation.) H is the set of derivation rules. 
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Let G be a grammar and v,w € (NUV)*. Then v => w is a direct derivation 
if and only if there exist v1, v2,v',w’ € (NUV)* such that v = viv've, w = viw'v2 
and uv’ — w’ € H. A derivation v >* u holds if and only if either v = u or there 
is a finite sequence of sequential forms connecting them as v = v0, U1,...Um = U in 
which v; => vi+1 is a direct derivation for each 0 <i < m. 

A sequence of letters v for which S =* w and v € (NUV)* holds, is called 
a sentential form. The language generated by a grammar G is the set of terminal 
words that can be derived from the initial letter: L(G) = {w|S >* wAw € V*}. 

Two grammars are (weakly) equivalent if they generate the same language (mod- 
ulo €). 

Depending on the possible structures of the derivation rules we are interested in 
the following classes. 

e type 2, or context-free (CF) grammars: for every rule the next scheme holds: A > v 
with AG Nandve(NUYV)*. 

e linear (lin) grammars: each rule is one of the following forms: A — v, A > vBu; 
where A,B € Nand v,weV*. 

e even-linear (elin) grammars: each rule is one of the following forms: A — v, 
A — w;Bwo; where A, BE N and v,wi,w2 € V* and the length of w; equals the 
length of we for each rule. 

e type 3, or regular (reg) grammars: each derivation rule is one of the following 
forms: A > w, A > wB; where A,B € N and w € V*. (Note, that this form is the 
so-called right-linear form of these grammars; we will use later alternative forms as 
well.) 

The generating powers of these grammars are in the following hierarchy: Lreg © 
Letin & Lun € Ler. 

Now we present normal forms for the rules of linear / even-linear / regular 
grammars. (For instance this is well-known and widely used for regular grammars. ) 


Lemma 1. Every linear grammar has an equivalent grammar in which all rules are 
in forms A—>aB,A— Ba,A- a. 

Every even-linear grammar has an equivalent grammar in which all rules are in 
forms A— aBb,A—a, Ae. 

Every regular languages can be generated by grammar having only rules of types 


A—-aB,A>a(A,BEN,a,bEV). 


Proof. Introducing new non-terminals each rule can be replaced by a sequence of 


rules in the desired forms. 


For context-free languages the concept of push-down automata fits. In the liter- 
ature, push-down automata with a restriction are used for linear languages (as they 
are special context-free languages). This restriction is the following: whenever the 
content of the stack is decreasing in a transition, it cannot push anything again into 
the stack. These special push-down automata are called 1-turn push-down automata. 
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From a derivation-tree point of view linear languages are more related to regular 
ones (Fig. 1, [6]). On the basis of this observation we modify the well-known concept 
of finite automata to get an accepting device for linear languages. 


Fig. 1. Derivations in regular grammar and in linear grammar (in normal form) 


There is at most 1 non-terminal in each sentential form, soit can be modelled 
by a finite-state machine as we will show in the next section. 


3 Definition of 2-Head Finite Automata Accepting 
Linear Languages 


We construct finite automata with two heads. They read the word from the beginning 
and the end, in parallel. 


Definition 3. The 5-tuple (Q,s,V,d,F) with the transition function d: Q x (VU 
{e}) x (VU {e}) — 2°, where Q is the finite set of states, s € Q is the starting 
state, F C Q contains the final states, and V is (as usual the set of terminals:) the 
alphabet. 


This automaton finishes reading the input word when the heads meet, so the 
whole word is processed: every letter is read by one of the heads. 
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tape 


ee <€---- 


Finite 
Control 


Fig. 2. A draw of a 2-head finite automaton 


In transitions we assign a pair of symbols (a, b) to the arrows meaning that the 
first head reads symbol a, the second reads b and both step. We allow both a and b 
to be the sign e. 

In Figure 2 the sketch of this type of automata can be seen with the 2 heads 
and the directions of their motions. 

One of our aims is to develop and analyse this automaton. 


4 Properties of the 2-Head Finite Automata 


First in this section we prove that linear languages are exactly those languages which 
are accepted by the 2-head automata. 
The proof consists of two parts. 


Theorem 1. Every linear language is accepted by a 2-head finite automaton. 


Proof. The proof is constructive. Let us start by a grammar of the linear languages 
of normal form presented in Lemma 1. Let the states of the automaton are the non- 
terminal symbols of the grammar with initial state s = S. Put a new state to the 
automaton as the final state. The alphabet V is the same as in the grammar. Now 
we give the transition function: for each rule of the form A — aB let a transition 
be B € d(A,a,¢). For the rules of type A — Ba let a transition be B € d(A,<,a). 
Finally for the rules A — a let the final state is in d(A,a,¢). It is easy to show by 
the construction that each derivation has a one-to-one correspondence with a run 
of the automaton. Therefore, the automaton accepts the linear language generated 
by the grammar exactly. 


Theorem 2. Every language accepted by a 2-head finite automaton is a linear lan- 
guage. 
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Proof. Now we construct a linear grammar based on the given automaton. Let the 
set V be the same for both the automaton and the grammar. Let the non-terminals 
be the representations of the states of the automaton, let S represent the initial 
state. The rules of the grammar will be generated from the transition function. For 
each transition B € d(A, a,b) let the rule A - aBb in H (a,b € (V U {e})). Finally, 
for all final-state F' the rule F — ¢ is given. It is easy to check that every run of the 
automaton has a unique derivation in the grammar and vice-versa. So, the grammar 
generates the same language as the accepted language of the automaton. 


As a special consequence of the previous theorems and constructions we can 
define a ‘normal form’ for this type of automata. 


Consequence 1 For cach 2-head finite automaton there is an equivalent one (ac- 
cepting the same language as the original one) with only transitions of the forms 


Bed(A,a,¢) and B € d(A,¢,a). 


This fact is based on the normal form of linear grammars presented in Lemma 


In figures the transitions of a normal form automaton can be (a,¢) and (¢,a). 
We can use the alternative notions — a and < a to indicate the direction of the 
moving head. 

Now let us see a famous example for a linear language: namely, the palindrome 
language. This language contains all the words which are read in exactly the same 
way both forwards and backwards. 

In Figure 3 the automaton of this language can be seen over the binary alphabet. 
This automaton is in ‘normal form’: at each transition the arrow shows which head 
is moving by reading the terminal letter. 


Fig. 3. 2-head finite automaton accepting the palindrome language 
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Now we show that the deterministic version of these automata is weaker: i.e. 
they do not accept all linear languages. 

Consider the linear language {a"b”} U {a3"b"} (n > 0). It is clear that it can 
be accepted by a 2-head non-deterministic finite automata trying both possibilities 
to check in a non-deterministic way. For a deterministic automaton it should be 
decided which head moves in which step. With a finite control it is impossible to 
know at first how many steps of the first head should be followed by a step of the 
second head. 


4.1 Characterization of even linear languages 


In this section we will use a special subclass of linear languages: namely, the even- 
linear ones. Note that the even-linear language class strictly contains the class of 
regular languages. 


Theorem 8. The 2-head automata using transitions type B € d(A,a,b) and C € 
d(A,a,¢), where C is a final state and a,b € V accepts exactly the even-linear 
languages, if there is not any transition from the final states which can be reached 
by a transition type C € d(A,a,€). 


Proof. Using the same constructions as in Theorems 1 and 2 the result will be 
a special linear grammar: namely, even-linear one. The automaton has transitions 
without finishing the read of the input word only type B € d(A,a,b). It means 
that both heads must move one step at the same time (using the normal form of 
Lemma 1 the relation is obvious). The only exception is when the input has only 1 
unread letter. In these cases only the first head steps finishing the word and accept 
it. Since there is no transition from the final state can be reached with only a tran- 
sition using 1 head, the automata must stop even if the input has unread letters. 


In the special automata above the process cannot be continued from a final state 
which is reached by a step not type B € d(A, a,b), but from Consequence 1 we know 
that these automata can be translated to automata in ‘normal form’. 

Now let us examine the deterministic version of these restricted automata. 


Theorem 4. The 2-head deterministic finite automata using transitions type BE 
d(A,a,b) and C € d(A,a,¢), where C is a final state and a,b € V accepts exactly 
the even-linear languages, if there is not any transition from the final states which 
can be reached by a transition type C € d(A,a,¢). 


Proof. First, it is trivial (and it is a consequence of the previous theorem) that 
the languages accepted by the deterministic version must be even-linear languages. 
Now, we will prove that all even-linear language can be accepted by deterministic 
2-head automata having the above properties. Let us use a method similar to the 
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one for regular languages starting with a non-deterministic automata to receive a 
deterministic one. This construction is the so-called set-construction. Let us start 
with the automaton (Q,s, V,d, F’) described in Theorem 3. Now let our new states 
(Q’) be the possible subsets of the original set Q. Let s’ = {s}. The same alphabet V 
is used for the deterministic automaton as well. Let the new transition function d’ be 
determined in the following way. For every pair of a,b € V the state q4 = d'(qi,a,b) 
such that qi € q) if and only if there is a state qj € qi for which q: € d(q;,a,b). 
For the possible transitions of another type (q € d(q;,a,¢)), let the deterministic 
transitions be qp = d'(qi,a,€) with a new state qp if qi, contains q;. We allow these 
transitions for the automaton only in cases when both heads can read the same place 
(i.e. the middle) of the tape and, therefore, it is not possible to step with both heads 
simultaneously. Let the set of the final states be all the states q} which contains any 
of F and qp. Finally, the states which are not accessible from the initial one can 
be deleted. It is easy to see that this automaton is deterministic and recognizes the 


same even-linear language as the original one. 


4.2 Comparison with other 2-head automata 


In this section we compare our 2-head automata with other 2-head automata. 

Usually in the literature the heads of the 2-head automata can move in the same 
direction [2]. 

Note that in [8] the so-called Watson-Crick automata are described. They are 
highly similar to other 2-head automata in the literature from our view-point. The 
main difference between our automata and these is the following. In Watson-Crick 
automata both heads go in the same direction. (We do not want to describe here 
that the Watson-Crick automata usually work with double strings, such as double 
stranded DNA molecules.) 

Now we want to show some examples of formal languages which can be accepted 
by the traditional 2-head automata and/or our new 2-head automata. 

For instance the language contain all words in shape a”b” (n > 1) can be 
recognized by both the traditional and the new 2-head machine. (In Figure 4 the 
new automaton type can be seen for this language.) 

A marked version of the so-called ’copy’-language ({wcew|w € {a,b}*}) can 
be recognized by a traditional non-deterministic 2-head machine. The languages 
{a"b"c"|n € N} and {a"b™c"d™|n,m € N} can be accepted by traditional deter- 
ministic 2-head machines. Since these languages are not even context-free ones, they 
are not accepted by any new type 2-head automata. 

The new automata accepts the language {ww™|w € V* and w 
of the word w}. This language cannot be accepted by any 2-head finite automata 
with heads moving to the same direction. 


FR is the reverse 
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<— 


Fig. 4. A 2-head finite automaton accepting the language a”b” (n > 1) 


Finally we present a variation of the new 2-head automata. In the new variation 
the heads do not know the position of the other one. Both heads should read the 
whole word, but in different directions. 

It is open to prove that this variation also accepts all the linear languages. 

The automata may guess when the heads are in the same position and after this 
point it uses the opposite transitions as before. This means that if it was a transition 
to the first head with a terminal, then it will be a transition with the same terminal 
for the second head in this second phase, etc.; but it is not easy to prove that there 
are no false acceptances. 


5 Conclusions 


One can imagine our new automata as automata working on words which are doubled 
up (see Figure 5, where the tape can move). 

To get all linear languages (not only the even-linear ones, in which each rule 
has the same number of terminals before and after the non-terminal (if any) on 
the right-hand-side) we allow transition steps in which only 1 head moves, while 
the other does not (it reads the empty word). A normal-form of the automata is 
presented. Using only 1 head in each step the work of the machine looks like the 
inverse of a derivation tree using the normal form (Lemma 1). 
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Finite 


Control 


Fig. 5. 2-head finite automaton with a folded tape 


One can use a special, restricted form of the automata — corresponding to the 
even-linear languages (for instance in normal form, the heads of the automaton can 
only step one after the other). Moreover it is proved that the even-linear languages 
can be accepted by deterministic machines as well. 

So we applied the finite state machines to recognize a wider class of languages 
than the original ones with 1 head. These results make it possible to use linear 
languages as simply as regular ones. We would like to extend these automata to 
work on words which are folded several times to characterize other language classes 
as well. 

It would also be interesting to analyse the differences among the languages ac- 
cepted by the variations of the new and the language classes accepted by variations 
of the known (traditional) 2-head (for instance Watson-Crick) automata. 


A note on related works 


Between the time this paper is written and appeared some works are done in this 
and related topics, see, for instance [3, 4, 5, 7]. The class of languages that can be 
accepted by our deterministic machines are also characterized in [5]. 
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Summary. In this paper we discuss parallel derivations for context-free, context- 
sensitive and phrase-structure grammars. For regular and linear grammars only 
sequential derivation can be applied, but a kind of “parallelism” is present in lin- 
ear grammars. We show that finite languages can be generated by a recursion-free 
rule-set. It is well-known that in context-free grammars the derivation can be in 
maximal (independent) parallel way. We show that in cases of context-sensitive and 
recursively enumerable languages the parallel “branches” of the derivation have some 
synchronization points. In the case of context-sensitive grammars this synchroniza- 
tion can only be local, but in a derivation of an arbitrary grammar we cannot make 
this restriction. We present a framework to show how the concept of parallelism can 
be fit to the derivations in formal language theory using tokens. 
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1 Introduction 


Chomsky type grammars and generated language families are one of the most basic 
and important fields of theoretical computer science. The field is fairly old, the 
basic concept and results are from the middle of the last century. On other hand, at 
the end of the twentieth century parallel computing played an increasingly greater 
role. In this paper we analyze the derivations of Chomsky type grammars and their 
relation to the concept of parallelism. We will use some variations of the well-known 
derivation trees and tokens on graphs in their Petri-net forms. 

Note that parallelism is not new in formal language theory. The so-called Indian 
and Russian grammars were investigated. These grammars can be related to the 
context-free case. The Lindenmayer systems are also parallel rewriting systems, but 
the language family they produce is orthogonal to the Chomsky-type language family 
(for more about this topic see [14]). In this paper we would like to analyse how the 
concept of parallelism can be present in Chomsky-type grammars. 

In the next section we recall some basic definitions that we will need later on. 
In Section 3 we show that in finite languages a derivation cannot be recursive in- 
dependently of the forms of the rules used. In Sections 4 and 5 we show that the 
derivations must be sequential in linear and regular grammars, but a kind of "paral- 
lelism" is needed in derivations for linear languages. After this we show that in the 
case of context-free grammars the derivation can be maximal parallel without any 
restriction or communication. In Section 7, which is based on Penttonen’s old result 
({11]), we build the derivation graphs for context-sensitive grammars in a tree-like 
form. With this approach we show that in these cases the derivation can be parallel 
to some synchronization points of the neighboring branches. Finally in type 0 gram- 
mars, using an appropriate normal form we show that synchronization can happen 
between branches at a distance, when all mediate branches terminate by the empty 
word. 

Using our approach we find that the generating power increases when the possi- 
bility of parallelism is present. Moreover with local communication (synchronization) 
it is more powerful. Finally when the synchronization is not merely local we get the 
whole recursively enumerable language class. We also present some interesting fur- 
ther branches of research based on these results. 


2 Preliminaries 
In this section we recall some basic concepts and facts about the field of formal 


languages and Petri-nets. First the definitions of the Chomsky-type grammars and 
the Chomsky hierarchy are shown. 
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2.1 Chomsky-type grammars 


A grammar is a construct G = (N,T,S,H), where N,T are the non-terminal and 
terminal alphabets, with NT = @; they are finite sets. S € N is a special symbol, 
called initial letter. H is a finite set of pairs, where a pair is usually written in 
the form v > w with v € (NUT)*N(NUT)* and w € (NUT)*. (We used the 
well-known Kleene-star notation.) H is the set of derivation rules. 

Let G be a grammar and v,w € (NUT)*. Then v => w is a direct derivation 
if and only if there exist v1, v2,v',w’ € (NUT)* such that v = viv'v2, w = viw'v2 
and uv’ — w’ € H. A derivation v >* wu holds if and only if either v = u or there 
is a finite sequence of sequential forms connecting them as v = v9, U1,...Um = U in 
which v; => vi+1 is a direct derivation for each 0 < i < m. A sequence of letters 
v € (NUT)* derived from S is called a sentential form, while we refer to u € T™ as 
a (terminal) word. We sign the empty word by X. 

The language generated by a grammar G is the set of terminal words that can 
be derived from the initial letter: L(G) = {w|S >* w\weT*}. 

Two grammars are (weakly) equivalent if they generate the same language (mod- 
ulo the empty word). 

Depending on the possible structures of the derivation rules the following classes 
of grammars /languages are considered: 


e Type 0, or phrase-structure grammars: there is no further restriction on the 
possible derivation rules. 

e Type 1, or context-sensitive grammars: all derivation rules are in the form 
vj Avz > viwee, with v1,ve € (NUT)*, A € N and w € (NUT)* \ {A} 
(except possibly for the rule S — A, in which case S does not occur on any right 
hand side of a rule). 

e Type 2, or context-free grammars: for every rule the next scheme holds: A — v 
with AG Nandveé (NUT)*. 

e Linear grammars: each rule is one of the next forms: A — v, A — vBw; where 
A,BeéN and v,w € T™. 

e Type 3, or regular grammars: each derivation rule is one of the following forms: 
A-—w, A wB; where A,B € N andweT™. 

e Finite languages: in this case, the restriction is not actually for the rules, but 
the number of the words in the language: it must be finite. 


The generated language is regular/ linear/ context-free/ context-sensitive/ re- 
cursively enumerable if it is generated by a regular/ linear/ context-free /context- 
sensitive/ phrase-structure grammar, respectively. For these families of languages 
we use the notations Dreg, Luin, Lor, Los, Dre, respectively, and Lyin denotes the 
finite languages. 

The Chomsky-type grammars and language families are well known. The gener- 
ating powers of these grammars are in the following hierarchy. 

Lyin & Lreg © Liin S Lor G Los G Lee. 


= = 
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2.2 Petri nets 


In this subsection we recall some concepts about Petri-nets based on [12, 2]. Formally 
the structure of a Petri-net is a directed bipartite graph: (V, £) with two types of 
nodes: V = SUT (SNT = @) and edges E C (S x T) U(T x S). The first node 
types (represented by ellipses in the figures) are places, and the other types are the 
transitions (rectangles). The connections are represented by arrows. Each arrow has 
different types of end nodes. In Figure 1 an example is shown. We indicated the 
places of the net by numbers. At each place there can be a token, so the state of 
the system is a binary vector with the dimension of the number of places of the net. 
A transition can switch if every place has a token from which an arrow goes to the 
transition (i.e. the transition is active, iff all its input places have tokens). After a 
transition has switched the tokens will move through on the transition. This means 
that a token will be at each output place and no token will appear at the input 
places of the switched transition. 

In the figure starting with a token at 1 (i.e. system-state (10000000)), only 1 
transition is active and the result will be the vector (01100000) after it switches. 
After the next step we have (00111100). Now three of the transitions are active. 
Suppose that the one needed tokens at 3 and 6 is going resulted (00011110). Here a 
token at 6 is used, but it is at the same place after the transition. We will call this 
type of connection a context-connection. Note that we allow transitions without 
outgoing edges which allows us to delete a token from the system. This happens 
with the token at 4 after the next transition switches. Finally, after the last possible 
transition the system is in state (00001011). There are no more active transitions, 
and the system halts. Note that the last two transitions are independent, so they 
can go at the same time in parallel. 

We will simulate the possible derivation processes at various types of grammars 
with the help of Petri-nets. The derivations have a widely used graphical representa- 
tion (basically for context-free grammars), which we will use in the sections below. 
In this paper we will use Petri-nets that start with only 1 token and finish the pro- 
cess with exactly the same number of tokens as the number of letters in the derived 
word. In our nets the states will be labelled with letters. The tokens at non-terminal 
labelled places are “living” tokens and at the terminal labelled places the tokens are 
not “living”. The transitions will be the derivation rules used. 

To read the derived word an order of the leaf-places will be used. A derivation 
is successful if and only if the leaves have tokens and therefore the systems halts. 
As we will see, a system may halt without a successful derivation. 
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Fig. 1. An example for a Petri-net 


3 Parallelism and Finite Languages 


From this section we use the Chomsky hierarchy in reverse order. We start from 
the smallest family (the finite one), and we go in the direction of the more effective 
grammars. 

For the Chomsky-type grammars there are so-called normal forms, in which the 
form of the derivation rules used are more restricted than in the original definition. 
It is also well-known that using only such restricted-form rules the generating power 
remains the same. 

From now on we do not care whether the empty word is in language L or not. 
It is obvious that using most normal forms the resulting language is L \ {A}. (If one 
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wants to generate the original language including 4, then she/he can add the new 
rules So — A, So > S (with a new non-terminal So, where S was the initial letter 
in the old grammar, and Spo is the initial letter in the new grammar) to the set of 
rules in any cases.) 

Let us review the case of finite languages. It is a very important subfamily of 
regular languages (recently, for instance the so-called cover automata are used to 
describe them). 

First, there is a normal form for these languages. Let L be a finite language. One 
can use the normal form containing rules of form S —> w, where w € T* (S > wi 
for each w; € L). 

In many cases the normal form above is not an efficient way of generating lan- 
guages. For this reason, we usually have other types of rules. To increase the effi- 
ciency (description of the grammar, generation speed, i.e. to get system with small 
number of derivations; and small cardinality of the rule set H) of the derivations we 
can allow stronger generating (for instance CF, or CS) rules as well (as they can be 
used in coding, for instance). 

Finite languages form a special class of regular languages that have no cycle in 
the automaton. When a non-terminal has been replaced in a derivation, it can never 
again be in the sentential form. (We call this the anti-pumping property of finite 
languages.) 

Apart from using a special regular grammar to generate a finite language we can 
use grammar in which there is no restriction on the form of the rules, so parallel 
derivation can used. For this effective generating method, we need a restriction on 
the non-terminals used: namely, they must have at least a partially ordering relation. 
We can formulate this in the following way. 


Theorem 1. Let G be a grammar with an ordering relation among the non-terminals 
such, that 

- S is the smallest, 

- if there is rule u — v in which the non-terminal A is rewritten, then each 
non-terminal appearing in v in the place of A is strictly greater than A. 

Then G generates a finite language. 


Proof. The second condition clearly implies that for context-free rules after a step 
used for the non-terminals A it disappears and only greater non-terminals appears. 
The number of non-terminals is finite which implies that the derivation finish in 
a limited number of steps. The same holds for context-sensitive and 0-type rules 
as well. Starting from the initial letter the sentential form only has non-terminals 
with higher values. (In CS-rules the context remains, but in the substituted part 
the value of non-terminals increases. In arbitrary rules each of non-terminals on the 
right-hand-side has greater values than the highest on the left-hand side. Therefore 
there is no way of pumping a word.) 
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How the “efficiency” of a grammar generating, for example, a finite language 
(using for instance, context-free rules) can be measured is an interesting question. 


4 Derivations in Regular Grammars 


First, normal forms for generating regular languages are recalled. There are several 
alternative forms of these types of grammars. 

A grammar is called right-(left-)linear if all derivation rules are in the forms 
A> uB,A>u(A Bu, A — u) with A,B € N and u € T*. Each regular 
language can be generated by either a left-linear or a right-linear grammar. The more 


restricted regular forms of the grammar can also be used. Each regular grammar 
has equivalent grammars that use only rules in the form A — aB,A — a (or 
A— Ba,A— a) with A,Be Nandae€T. 

Now we show examples for regular derivations both in the usual “derivation-tree” 
and the Petri-net forms. 


Example 1. Let Gi = ({5,A, B,C}, {a,b}, S,{S — aS,S — baA, A > aA, A > 
baaB,B — aB,B — b,B baaaC,C — aC,C — a}) be a regular grammar. 
Figure 2 shows a derivation in this system. 
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Fig. 2. Derivation in a regular grammar 
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Example 2. Let G2 = ({S, A}, {a,b}, S,{S — aA, A — bS,A b,}) be a regular 
grammar. Figure 3 shows a derivation in this system by a Petri-net that initially 
has 1 token in the top position. 


Fig. 3. Petri-net representing a derivation of a regular grammar 


In Figure 2 a so-called derivation-tree is shown. In these graphs all leaves are 
labelled by a terminal symbol (or sometimes by the empty word A). All other nodes 
(they are labelled by non-terminals) must have some (at least one) successors. 
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Using the Petri-net representation of a derivation we can assume that the net 
starts with only one token at the start-node. When the tree branches a living token 
that goes through on a transition, it must be multiplied in the following way: it 
disappears from its original place and there will be a token at each successive place. 
The derivation is (successfully) finished when all leaves (terminal labelled places) 
have a token, and there are no more tokens in the graph. In this case there is no 
living token in the net any more. In regular cases the process can only go in one 
order. 

Using only rules of types A — aB, A — a the derivation goes letterwise. Starting 
from the first letter of the word to the last one the derivation gives one letter in each 
step. This is a totally sequential derivation. In all regular languages every word can 
be built letter to letter from the beginning. 

We discuss some of our results and comments about regular grammars in the 
section below in comparison ti the next class, the linear one. 


5 Derivations in Linear Grammars 


As we can see in regular and linear grammars each rule has a non-terminal on the left 
hand side and at most one non-terminal on the right hand side. (Linear grammars 
are special type 2 grammars with at most 1 non-terminal on the right hand side of 
each rule.) 

Therefore we have the following statement. 


Proposition 1. The derivations in a linear (or regular) grammar can only be in 
sequential mode. 


Proof. Starting from the initial letter the sentential form contains at most 1 non- 
terminal in each step. When the sentential form has a non-terminal we must replace 
it by arule in the next step. Without a non-terminal the derivation cannot continue. 


It is terminated with the word containing only terminals. 


The only difference between the regular and the linear case is the following. In a 
derivation of a regular grammar the same side is always used for further derivation; 
all the terminals appear on the other side of the non-terminal. These derivations 
are totally sequential ones. In linear grammars the terminals can appear on both 
sides of the derivations graph. It is a kind of parallelism which appears at these 
derivations. Both sides can/must be built parallel to the words. In the general case 
a word cannot be obtained from the beginning to the end. 

Now, a normal form for linear grammars is presented. 


Lemma 1. Every linear grammar has an equivalent grammar in which all rules are 


in forms A> aB,A— Ba,A>a (A,BEN,a€T). 
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Proof. Introducing new non-terminals each linear rule can be replaced by a se- 
quence of rules in the desired forms. 


Now we show examples for derivations in linear grammars. 


Example 3. Let G3 = ({S, A, B}, {a,b}, S,H3) be a linear grammar, with rule set 
Hz ={S—aA,A— Sa,A—a,S — bB,B — Sb, B + b,S — a,S — bd}. Figure 4 


shows a derivation in this grammar. 


Fig. 4. Derivation in a linear grammar with rules in normal form 


Example 4. Let Ga = ({S}, {a,b}, S,{S — ab,S — aSb}) be a linear grammar. 
Figure 5 shows a Petri-net representation of a derivation in this system. 


Remark 1. When a grammar only has rules in types A — aB, A > Ba, A — a the 
derivation graph is a binary tree. All non-terminal nodes except the last one has 
two successor nodes and exactly one of them is a non-terminal node. 


Based on Proposition 1 the following statement is true. 


Corollary 1. The derivation trees and the derivations in linear (or especially regu- 
lar) grammars have one-to-one correspondence. 
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Fig. 5. Petri-net for derivation in a linear grammar 


The derivation processes that use the Petri-net representation in linear and reg- 
ular cases can only go in one order: the order of switching of the transitions is 
determined by the net. The regular and the linear cases only differ in the order- 
ing of the output places and leaves. Based on these facts linear languages can be 
recognized by finite automata equipped by two heads [10, 8]. 

The concept of derivation-trees is more complete in the context-free grammars, 
as the next section will present. 


6 Derivations in Context-Free Grammars 


CF grammars are very popular because the concept of derivation trees fits very 
well in these derivations. CF grammars are more powerful than previous linear and 
regular grammars. The left hand side of each rule contains only one non-terminal as 
in the linear or regular case, but there is no restriction for on the right hand side. 

Well-known the following fact about the possible divide to smaller (easier) parts 
(problems)’ type parallelism. 
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Proposition 2. In a context-free grammar the derivation can go in a maximal par- 
allel way. The derivation tree can be built by levels: i.e., every non-terminal of the 
sentential form can be rewritten using a derivation rule at the same time. 


Proof. It follows from the structure of the derivation rules. In each rule-using a non- 
terminal will be replaced independently of the other parts of the sentential form. 
For each derivation (knowing the replaced non-terminal and the applied rule) there 
is a unique derivation tree. But we get the same result as the original derivation, 
if all non-terminals of the sentential form is replaced by each step. In this way we 


build a complete level of the derivation tree in each step. 


Corollary 2. The possible sequential derivations in context free grammars form 
equivalent classes. Each class can be represented by a derivation tree. Each deriva- 
tion tree can be represented by a unique sequential derivation, the so-called left-most 
derivation. 


In this way, one can assume that the nodes of the derivation tree are problems, 
the child nodes are the subproblems and the terminal-labeled nodes are the easily- 
solved (or trivial) problems. The derivation is “more parallel” in the context-free 
case than in the linear one. The word can be built in many places at the same time 
independently. 


Example 5. Let Gs = ({S, A}, {a,}, c,d}, S,Hs) be a context-free grammar, with 
rule set Hs = {S — AbA,S — cA, A — a, A — dSd}. Figure 6 shows a derivation- 
tree in this system. 


Now, we recall possible normal-forms for context-free grammars. 


Fact 1 For each context-free grammar there is an equivalent grammar in which all 
derivation rules are in one of the forms A— BC,A—>a (A,B,C EN,a€T). A 


grammar that only has these kinds of rules is in the so-called Chomsky normal form. 


Using the Chomsky normal form the tree has a special binary tree form. Each 
node labeled by a non-terminal has two successor nodes labeled by non-terminals or 
only one successor node labeled by a terminal (leaf-node). 

In the next example we generate the Dyck language. 


Example 6. Let Ge = ({S,A, B,C}, {a,b}, S, He) be a context-free grammar in 
Chomsky normal form, with rule set He = {S — SS,S > AB,S — AC,C > 
SB,A — a,B — 6}. Figure 7 shows a derivation-tree in Petri-net form in this 
system. 
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Fig. 6. Derivation in a context-fee grammar 


As the figures show the parallel branches of the trees are independent of each 
other. The branches of the derivation use their tokens independently. In the case of 
CF (and also regular and linear) grammars the sequence of switching of the Petri-net 
of a derivation(-tree) must be finished by a successful derivation, since there is at 
least 1 active transition until the system-state equals the state with tokens exactly 
on the leaves. 

In the sections below we show that the concept of derivation trees and the parallel 
derivations can be used in context sensitive and phrase structured grammars as well 
(for more details, see [6, 9]). 


7 Derivations in Context-Sensitive Grammars 


Normally we can use sequential derivations with the sentential forms in CS case. 
The concept of derivation trees does not work in pure form. The neighborhood of 
a non-terminal can also be important using a replacing rule. In the “old days” of 
formal language theory various attempts were made to describe the derivations of 
context-sensitive grammars by tree-like structures. In general, the results were not 
satisfactory. In this section we present a new kind of derivation structure, which 
may be useful. 

We use two kinds of edges in these derivation graphs. The original, derivation 
edges come from the replaced non-terminal and go to the new parts of the string 
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Fig. 7. Petri-net form of a derivation in a Chomsky normal form grammar 


given in the right hand side of the derivation rule used. The new type of edges (rep- 
resented by boxes and dotted arrows) show the neighborhood of the non-terminals 
replaced as it is requested by the applied rule. We will use the names context-box 
and context-edge. 


Example 7. Let G7 = ({S,A, B,C, D,E, F,G,I, J, K,L,M,O, P}, {a,b,c}, 8, 

H7) be a context sensitive grammar, with rule set H7 = {S — aSA,S > 
bSB,abS — abCE,baSA — baDFA,EA — EG,EG — IG,IG — IE,IE — 
AE, EB EJ,EJ —~ KJ, KJ — KE,KE — BE,FA- FL,FL— ML, 

ML — MF,MF — AF,FB > FO,FO — PO,PO > PF,PF — BF,CA = 
CE,CB — CF,DA — DE,DB — DF,C — a,D bE a,F b}. Figure 8 
shows a possible derivation-graph in this system. 


The derivation here can have parallel branches, but the solutions of the subprob- 
lems are not necessarily independent. Sometimes in order to continue the work on the 
solution of a subproblem results are needed from other (neighboring) subproblem- 
solutions. (Communication is used in this way among neighboring branches.) 

Now we use Penttonen’s result: 


Fact 2 Every contert-sensitive language can be generated by a grammar whose 


derivation rules are of the form A — BC, AB — AC, A — a, where A, B and 
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Fig. 8. “Derivation-tree” in a context sensitive grammar with context-boxes 


C are nonterminals and a is a terminal. This normal form is from ([11]), where it 
was called the one-sided normal form. 


Using this normal form the derivation ’tree’ will be simpler. Each context-box 
contains only a left-neighbor non-terminal. Using this special grammar form the 
derivation graphs will have simpler structures. 


Example 8. Let Gs = ({S, A, B,C, D, E, F,G,I, J, K, L, M, O}, {a,b,c}, S, Hg) be a 
context sensitive grammar in Penttonen normal form, with rule set Hg = {S — 
AG,G — BC,A— IJ, J — DE,EB —> EE, EC > EK, K > FL,D—IM,M — 
AB, BE — BB,BF — BO,O—CL,A-a,B—-b,C > ¢,D—-a,E -b,F 

c,I — a, L — c}. Figure 9 shows the Petri-net of a possible derivation-graph in this 


system. 


Remark 2. In a context-sensitive grammar the derivation can be parallel, but when 
the context is important a synchronization point is needed. Therefore, the derivation 
is not maximal parallel; the ‘speeds’ of the branches usually differ. 


There is a token for each non-terminal in the graph which is present at the same 
time (in the actual sentential form). Then using a rule in which there are more 


; TRIANGLE 8 e June 2012 


116 B. Nagy 


Fig. 9. Petri-net of a “derivation-tree” in a Penttonen normal form grammar 


non-terminals on the right hand side than the left hand side the graph will have 
more tokens than before. For example at rule S — AG a new token will be born as 
in previous (i.e. CF) cases. The rules that need context can be used only if all the 
letters which are in their context have tokens. For instance, in the rule EC — EK 
the non-terminal C' can be replaced by K only if there are tokens at C' and at 
the left-most neighbor in the derivation, which is a node labelled by an E. Then a 
token moves from the node labelled by C' to node K on a normal graph edge. The 
derivation is successfully terminated if all the leaves have tokens, and there are no 
other tokens in the graph. As we can see, we cannot generally use the concept ‘level’ 
in these graphs. 
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A derivation from a non-terminal can be continued when all branches are after 
the points where this non-terminal (as a place in the net) is needed as a (part of a) 
context. This means that the non-terminal has been used at all the context-edges 
which contain it; it satisfies the other branches of the derivation at the meeting 
points. 

Using Penttonen’s result, the derivation graph has a simpler form. The derivation 
can go left to right. The leftmost branch does not depend on other branches. The 
next branch may need context somewhere, and it can be found on the finished left 
neighbor branch. It is important that these context edges cannot cross each-other. 
Of course a derivation graph usually represents more than one sequential derivation. 
From the graph the “left-most” derivation of a word in a CS grammar can be obtained 
in the following recursive way. 

Use the left-most branch up to the first place at which a context edge starts. 
Then use the next branch till this point and use the context edge as well. (When 
a context edge starts in this branch, then the right neighboring branch must be 
derived to this point as well, and so on.) When a branch terminates, the next one 
is its right-hand neighbor (at the lowest branching point). When all branches have 
terminal letters on their end, the derivation is finished. 

In the Petri-net form of the derivations the “appearance check” appeared. In 
these transitions more tokens are needed, but only one of them "develops" after the 
transition (i.e. it is cancelled from its original place and appears in new place(s)). 
All others will be at the same place as before this step. 

The derivation process in context-free (and specially in linear and regular) cases 
can halt only without a “living token”, finishing the derivation of a word. In the case 
of context-sensitive derivations the net can halt without terminating the derivation, 
so the applications of the rules has a new restriction. This restriction is presented 
by the context edges and the context-connections of the net. 

As we can see, if a kind of synchronization (communication, or appearance check) 
is used among the parallel branches, the generating power of the grammar increases. 
If Penttonen normal form is used our notation is redundant. The context arrows or 
context boxes can be deleted, because the context used must be a ‘one letter left- 
context’. Using this kind of approach of context-sensitive derivations an algorithm 
based on the Cocke-Younger-Kasami algorithm can be presented for CS-parsing. 
(This could be the subject of future study.) 


8 Derivations in RE (Arbitrary) Grammars 


With the original form of these grammars (see Section 2.1) we cannot say anything 
about parallelism, because we cannot be sure about what kind of context will be 
needed and how it will change in a future derivation step. Using sentential form the 
derivation can be in sequentially, but we would like to say something more. 
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A normal form of the grammars can again be of help. There are many normal 
forms for phrase-structured grammar. We will use the following facts (they can be 
found in [15, 4]). 


Fact 3 Each phrase-structure grammar is equivalent to a grammar with only contezt- 
sensitive rules, and a single additional rule A — X. 

Every recursively enumerable language can be generated by a grammar containing 
rules only in forms A > BC, AB — AC, A — a and A > X (where A,B,C € 
N,a€T). 


Using the normal form given in Fact 3 our case looks like the context sensitive 
case but the rule(s) A — A, (where A € N). Let us check what difference appears 
for the reason of these eliminating rules. 


Example 9. Let Gs = ({5, A, B,C}, {a,b,c}, S,{S — AB, B = BC,B — \, AC > 
AA, A — a,B — b,C — c}) be a phrase structured grammar. Figure 10 shows a 
possible derivation-graph in this system. 


Fig. 10. "Derivation-tree" in a phrase structured normal form grammar 


As we can see, the context edges can connect nodes which are far from each 
other when the empty word (A) is derived from all the branches between. In these 
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cases the node which needs to be a context must wait till the derivation is finished 
by the empty word in other branches, and then it can be used as the context of a 
further node. 

A new phenomenon appears in these cases: The derivation process looses the to- 
kens at the leaves containing the empty-word. (In all previous sections the derivation 
processes were without loosing tokens.) In the Petri-net representation the transi- 
tions without outgoing edges represent these A-rules. 


9 Conclusions 


An approach to generate a finite language efficiently is shown (allowed any kind of 
derivation rules). In the case of regular and linear grammars the derivation must 
be sequential; the sequence of switching transitions of the corresponding Petri-net 
is unique. The generating power of a grammar is larger if the possibility of paral- 
lelism exists. In linear grammars the word is built parallely in two places: i.e. the 
order of leaves varies. In context free grammars the derivation can be maximal par- 
allel, the word can be built in several places at the same time and the branches 
of the derivations are independent of each other. The Petri-net forms a tree, the 
number of possible switching sequences of transitions are usually high and they all 
represent successful derivations. As we have seen the generating power increases to 
allow synchronization (communication or appearance checking) between the parallel 
branches. In the context-sensitive case it is enough to use only the left-neighboring 
token in this synchronization. A sequence of switching transitions may not wait 
for these synchronization points, so the net can halt without a successful deriva- 
tion. Moreover, if A-rules are allowed this context communication can connect nodes 
which are far from each other so the efficiency is similar to that of the Turing Ma- 
chines. In our paper we have mixed the concept of the derivation in a grammar and 
the theory of Petri-nets. Using grammars in normal forms at each token-multiplying 
step only one new token may appear. The parallelism of context-free derivations is 
our basis. With other kinds of restriction (for instance ordering the non-terminals) 
parallelism can be used in derivations of finite (regular/linear) grammars as well. 
To analyze what effectiveness mean involving the parallelism is a topic of further 
research. On the basis of the derivation-graph presented for the context-sensitive 
case we are working on a CYK-like parsing algorithm for grammars in Penttonen 
normal form. 

As the first version of this paper is written in 2004, and it appears in 2010, some 
ideas presented here are further developed in, for instance, [7, 9, 6, 8]. 
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Summary. Identification of deterministic finite automata (DFAs) has an extensive 
history, both in passive learning and in active learning. Intractability results by Gold 
[5] and Angluin [1] show that finding the smallest automaton consistent with a set 
of accepted and rejected strings is NP-complete. Nevertheless, a lot of work has 
been done on learning DFAs from examples within specific heuristics, starting with 
Trakhtenbrot and Barzdin’s algorithm [15], rediscovered and applied to the discipline 
of grammatical inference by Gold [5]. Many other algorithms have been developed, 
the convergence of most of which is based on characteristic sets: RPNI (Regular 
Positive and Negative Inference) by J. Oncina and P. Garcia [11, 12], Traxbar by 
K. Lang [8], EDSM (Evidence Driven State Merging), Windowed EDSM and Blue- 
Fringe EDSM by K. Lang, B. Pearlmutter and R. Price [9], SAGE (Self-Adaptive 
Greedy Estimate) by H. Juillé [7], etc. This paper provides a comprehensive study 
of the most important state merging strategies developed so far. 


1 Introduction 


The problem of DFA identification from examples was first mentioned in a paper 
by Gold [4] back in 1967, when he also introduced the notion of learning formal 
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languages. Motivated by observing how children acquire their first language, he 
suggested that learning is an infinite process of guessing of grammars that does not 
terminate in finite steps but only converges in the limit. 

In 1973, Trakhtenbrot and Barzdin described a polynomial time algorithm 
(henceforth denoted TB) for constructing the smallest DFA consistent with a com- 
pletely labeled training set (a set that contains all the words up to a certain length). 

Five years later Gold rediscovered the TB algorithm and applied it to the dis- 
cipline of grammatical inference (uniformly complete samples are not required). He 
also specified the way to obtain indistinguishable states using the so called state 
characterization matrices. If the data set does not contain the characteristic set 
mentioned above the algorithm guarantees the consistency at the cost of outputting 
the prefix tree acceptor (PTA) of the positive sample. 

In 1992 Oncina and Garcia proposed the RPNI (Regular Positive and Negative 
Inference) algorithm [12], and in the same year Lang described the TB algorithm 
and generalized it to produce a (not necessarily minimum) DFA consistent with a 
sparsely labeled tree [8]. The algorithm (Traxbar) can deal with incomplete data 
sets as well as complete data sets. 

All the algorithms mentioned above are data-dependent (also called data-driven) 
and they do not take into account any evidence present in the sample. Since 1997, 
several evidence-driven algorithms have been proposed. The main contribution to 
the field in this direction is due to the Abbadingo One contest which took place in 
1997. The competition was held by Kevin J. Lang and Barak A. Pearlmutter and 
presented the challenge of predicting, with 99% accuracy, the labels that an unseen 
finite state automaton would assign to test data given training data consisting of 
positive and negative examples. There were two winners: Robert Price, for solving 
the 60,000-string, 506-state problem and Hugues Juillé, for solving the 1,521-string, 
65-state problem. 


2 Preliminaries 


In this paper we use the standard definitions and notations of formal language 
theory. The reader is referred to [6, 10] for further information about this domain. 
Let © be a finite set of symbols called alphabet and let ©™* be the set of strings over 
+}. A language L over 5 is a subset of ©’*. The elements of L are called words. Let 
u,v,w be strings in * and |w| be the length of the string w. 2 is a special string 
called the empty string and has length 0. Given a string w = uv, u is a prefiz of w 
and v is a suffir of w. We define: 

Pr(L) = {ue &* | du € &* such that uv € L}, 

Suf(L) = {v € &* | du € &* such that uv € L}, 

Ly = {v € &* | uv € L}. 
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2.1 Finite automata 


A deterministic finite automaton is a 5-tuple A = (Q, >’, 6, qo, F’) where Q is a finite 
set of states, is a finite alphabet, go € Q is the initial state, F' C Q is the set of final 
states and 6 is a partial function that maps Q x »’ to Q. The transition function 6 
can be extended to strings by doing 6(q, A) = q and 6(q, ua) = 6(6(q, uw), a), Va € Q, 
Vu € &*, Va € XY. A word wu is accepted by A if 6(qo,u) € F. The set of words 
accepted by A is denoted by L(A). 

A non-deterministic finite automaton (NFA) is defined like a DFA with the only 
difference that that transition function is a mapping from Q x »' to 22. In general, 
a finite state automaton (FSA) refers to either a DFA or an NFA. 

A finite set S; is called a positive sample for the language L if S; C L. Analo- 
gous, a negative sample for the language L is a finite set S_ such that S_ C X*\ L. 
A completely labeled data set includes all example strings up to a given length. 

We say that an automaton is consistent with a sample if it accepts all positive 
examples and rejects all negative examples. A set is said to be structurally complete 
with respect to a DFA A if it covers each transition of A and uses each final state 
of A. 


2.2 Quotient automaton 


For any set S, a partition 7 is a set of pairwise disjoint nonempty subsets of S whose 
union is S. Let s denote an element of S and let B(s, 7) denote the unique element, 
or block of 7 containing s. Given two partitions 7; and 7;, 7; is finer than 7; if every 
block of 7; is a union of one or several blocks of 7;. We denote this by 7; X 7;. 

Let A = (Q,»',6,qo, F) be an FSA. The quotient automaton A/a = (Q’', 2,6’, 
B(qo,7), F) is defined by: 

Q' = Q/n = {B(a,™)lq € Qh, 

F’={BeQ'|BnF FU}, 

6'(B,a) = {B' € Q’ | Aq € B,d € B’ such that gq’ € 6(q,a)} for all B € Q’, 

aEen. 


The states of @ belonging to the same block B of the partition a are said to 
be merged together. The set of all derived automata obtained by systematically 
merging the states of A represents a lattice of FSA [13]. Given a canonical DFA A 
and a set S, that is structurally complete with respect to A, the lattice derived 
from PT A(S;) is guaranteed to contain A [2]. 


2.3 Prefix tree acceptor - augmented prefix tree acceptor 


Given a set S+, let PT A(S+) denote the prefix tree acceptor for Si. PTA(S+) isa 
DFA that contains a path from the start state to an accepting state for each string 
in S; modulo common prefixes. Clearly, L(PTA(S+)) = S4. 


; TRIANGLE 8 e June 2012 


124 C. Tirnauca 


More formally, PT A(S+) = (Q, 2’, 6, qo, F’) where: 


Q = Pr(S), 
+) = the alphabet of Si, 
e 6(u,a) = ua, for all u,ua € Q 
— fA if Sy 40 
= as re otherwise. 
e F => Sy. 


An augmented prefiz tree acceptor (APTA) with respect to S; and S_, denoted 
APT A(S+, S_), is defined as a 6-tuple (Q, 5’, 6, qo, F+, F_) where: 


Q = Pr(S,US_), 

+) = the alphabet of S;U S_, 
6(u, a) = ua for all u, ua € Q, 
— 1". if (Si US_) x 


0, otherwise. 


Pp =Sy 
f= S. 
Example 


Consider sets S; = {0,1,010,011} and S_ = {01,11}, then the PTA(S+) and 
the APT A(S,, S_) are illustrated in Figure 1. 


Fig. 1. PTA(S+) and APTA(S+, S4) 
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3 Grammatical Inference 


Grammatical inference is known as one of the most attractive paradigms of scientific 
learning. The goal of any inference algorithm is roughly to discover a grammar 
that generates a given set of sample sentences. The learning model that was first 
introduced (and also the most used) is learning in the limit. In this setting, the 
learner has access to either a growing sequence of positive examples (learning from 
text), or both positive and negative information (learning from informant), and has 
to output his hypotheses. After some finite time, the guesses must converge to the 
correct language. 

Gold [4] shows that given a positive presentation one cannot identify the class 
of regular languages, and that any recursively enumerable class is identifiable using 
a complete presentation (positive and negative data). 

Learning paradigms seem not to be applicable to human learning: 


e Gold’s identification in the limit framework has been criticized as children seem 
to learn natural language without negative examples; 
All learning paradigms assume a known representation class; 
Some learnability results are based on enumeration. 


The problem of minimum automaton identification from incompletely labeled 
training data has been proved to be NP-complete [5]. However, the average case is 
tractable [8]. 


4 Algorithms for Learning DFA 


Below we present the most important algorithms for DFA identification from exam- 
ples. 


4.1 The Trakhtenbrot and Barzdin algorithm 


The algorithm proposed by Trakhtenbrot and Barzdin [15] produces the canonical 
DFA for any language, from a complete data set, in polynomial time. Perhaps the 
biggest advantage of this algorithm is its simplicity. Furthermore, it deals with data 
sets of various sizes in a very short time frame. Unfortunately, it also has a disad- 
vantage since the algorithm merges compatible nodes in breadth-first order despite 
evidence or clues present within the training data. In other words, the attempted 
merge order is predetermined, and very little search of the problem space is necessary 
to determine the next merge pair. 

It is important to note that the advantages and disadvantages outlined above 
apply not only to the algorithm proposed by Trakhtenbrot and Barzdin [15] for 
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complete data sets but also for the modified version of this algorithm for incomplete 
training data. 

Given an APTA A = (Q,’,6,qo, F+, F_), we say that two states p and q are 
distinguishable in A if there exists a word u in X* such that (d(p,u) € Fy and 
6(q, uw) € F_) or (6(p, u) € F_ and 6(q,u) € Fy). Otherwise, p and q are not distin- 
guishable in A. For a detailed description of the procedure distinguishable(p, q, A), 
the reader is referred to [3]. 

U is a set of unique nodes; that is, nodes that are pairwise distinguishable. The 
algorithm starts by adding the root of the APTA to the list of unique nodes. Then, 
it visits each proceeding node q of the APTA in breadth-first order, compares the 
subtree rooted at q with the sub-tree rooted at each node in the unique nodes list. 
If q is pairwise distinguishable from each node from U, it appends q to the end of 
the list. Otherwise, it disconnects q from the APTA. 

An upper bound on the running time of the algorithm is mn”, where m is the 
total number of nodes in the initial APTA and n the total number of states in the 
final hypothesis (more details in [9]). 

The TB Algorithm is described bellow: 


Algorithm 1 TB Algorithm 
TB(APTA(S+, S_)) 
A:=APTA(S,, S_); 
U := {A}; 
While p visits each proceeding node of A in breadth-first. order 
dist:=true; 
While (q in U) and (dist) 
dist:=distinguishable(p, q, A); 
End While; 
If dist then append p to U 
else disconnect p from A; 
End While; 


4.2 Gold’s algorithm 


The algorithm proposed by Gold [2] is based on the so called state characterization 
matrix. 

A state characterization matriz over an alphabet » is a triple (S,£,T) where 
S, E are finite subsets of /* and T': (S USZ)E — {0,1, 7}. The elements of S are 
called states, and those of E are called experiments. The function T is defined using 
the sets S_ and S_ as follows. For alluce SUS and ve E£, 
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1, if w € Sy 

T(uv) = 4 0, if ww € S_ 

T, otherwise. 

Every element u of S US’ defines a row which will be called row(u). Given 
u,v € SUS, we say that row(u) is obviously different from row(v), and we write 
row(u) # row(v), if there exists an experiment e € E such that T(ue), T(ve) € {0,1} 
and T(ue) 4 T(ve). 

A state characterization matrix is called closed if none of the rows in S7— S is 
obviously different from the rows in S. 

Gold’s algorithm was initially established using Mealy machines. Here we use 
Moore machines. Doing it this way, the comparisons between algorithms can be seen 
more clearly. 

A Moore machine is a 6-tuple M = (Q, 1’,I,6,qo,®), where »' (resp. I’) is the 
input (resp. output) alphabet, 6 is a partial function that maps Q x Y’ in Q and & 
is a function that maps Q in I’ called output function. The behavior of M is given 
by the partial function ta; : &* => I defined by tas(u) = ®(6(qo, u)), for every wu in 
+* such that 6(qo, u) is defined. 

Given two finite sets of words, S; and S_, we define the prefix Moore machine 
PT M(S4,S_) as the Moore machine having I’ = {0,1, T}, Q = Pr(S:US_), qo =A 
and 6(u,a) = ua if u,uwa € Q and a € &. For every state u the value of the output 


function associated to u is 1, 0 or T (undefined) depending whether u belongs to S+, 
S_ or it is in the complementary set of S;U S_. 

There are exactly two places where the algorithm may be nondeterministic. The 
first one is when there are several rows from S'— S that can be moved to S. The 
second is when we are building the output automaton and there are several obviously 
different rows (states) where the transition can be assigned. One solution that can 
be adopted for both situations is to choose the smallest row in lexicographic order. 

For a better understanding of Gold’s algorithm the reader is referred to [3] in 
which it is described as a procedure of merging states in the prefix Moore machine 
of the sample. 


4.38 RPNI algorithm 


The regular positive and negative inference (RPNI) algorithm proposed by Oncina 
and Garcia [12] is a polynomial time algorithm that identifies a target DFA, given 
the sample S = S,US_. It was shown that if the sample includes a characteristic set 
then the algorithm is guaranteed to return a canonical representation of the target 
DFA [12]. 

In order to present the algorithm we need some definitions and notations. 


e the set S,(L) of short prefixes of L is 
Sp(L) = {u € Pr(L) | Vu € &* such that Ly = Ly,u < v} 
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Algorithm 2 Gold’s Algorithm 
Gold(S,,,S_) 
S:= {A}; E:= Suf(271($;, U S_)); 
Build the table (S, E, T); 
While there exists s’ € (SX’— S) s.t. row(s’) # row(s), Vs € S 
Choose any s’; 
S:= SU{s'}; 
Update (S, E, T); 
End While; 
Q := S; qo := 4; 
For alls eS 
@(s) := T(s); 
For alla ce 
If sa € S then 6(s,a) := sa 
else 5(s,a) := any s’ € S's.t. not(row(sa) # row(s’)); 
End For; 
End For; 
M:= (Q, xy, {0, 1, Th 6, qo, 8); 
If M is consistent with (S;,S_) then Return(M/) 
else Return (PT M(S;,S_)). 


e the kernel N(L) of L is 
N(L) = {A} U {ua € Pr(L)|u € Sp(L),a € LY} 
A sample S = SU S_ is said to be characteristic with respect to a regular 
language L (with the canonical DFA A) if it satisfies the following two conditions: 
1. N(L)N LC Pr(S4), 


2. Vu € Sp(L),uv € N(L), if Lu A Ly then dw € X”* such that (uw € Sy and 
vw € S_) or (uw € S_ and vw € S,). 


Intuitively, condition 1 implies structural completeness with respect to A and 
condition 2 implies that for any distinct states of A there is a suffix w that correctly 
distinguishes them. 

Notice that: 


e if you add more strings to a characteristic sample it remains characteristic, 
e there can be many different characteristic samples. 


The RPNI algorithm is described below: 

The convergence of the RPNI algorithm relies on the fact that sooner or later, 
the set of labeled examples seen by the learner will include a characteristic set. 

If the stream of examples provided to the learner is drawn according to a simple 
distribution, the characteristic set would be made available relatively early (during 
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Algorithm 3 RPNI Algorithm 
RPNI (PTA(S:), S_) 
A:= PTA(S4); 
K := {qo}; Fr := {6(q0,a)|a € &*}; 
While Fr 4 0 
choose q from F'r; 
If dp € K such that L(dmerge(A, p,q))NS_ = 0 
then A := dmerge(A,p,q) 
else K := K U {p}; 
Pr = {6(¢,a)|q € K} — K; 
End While. 


learning) with a sufficiently high probability, so the algorithm will converge quickly 
to the desired target. 

RPNI is an optimistic algorithm: at any step two states are compared and the 
question is: can they be merged? No positive evidence can be produced; merging will 
take place each time that such a merge does not produce inconsistency. Of course 
an early mistake can have disastrous effects and a breadth first exploration of the 
lattice is likely to be better. 


4.4 Traxbar algorithm 


A variation of the Trakhtenbrot and Barzdin algorithm (Traxbar) was implemented 
by Lang [8] in order to show that random DFAs can be approximately learned from 
sparse uniform examples. 

The modifications made to the algorithm were needed to maintain consistency 
with incomplete training sets. For instance, unlabeled nodes and missing transitions 
in the APTA needed to be considered. 

The simple extensions added to the Trakhtenbrot and Barzdin algorithm are 
summarized as follows. 

If node p is to be merged with node q then: 


e labels of labeled nodes in the sub-tree rooted at p must be copied over their 
respective unlabeled nodes in the sub-tree rooted at q; 

e transitions in any of the nodes in the sub-tree rooted at p that do not exist in 
their respective node in the sub-tree rooted at q must be spliced in. 


An important observation is that the definition of distinguishable states does 
not change. However, because the sample is not complete, we do not know for all 
the states whether they are accepting or rejecting. 

As a result of these changes, the Traxbar algorithm will produce a (not neces- 
sarily minimum size) DFA that is consistent with the training set. 


; TRIANGLE 8 e June 2012 


130 C. Tirnauca 


Algorithm 4 Traxbar Algorithm 
Traxbar(APTA(S,, S_)) 
A:=APTA(S,, S_); 
U := {A}; 
While p visits each proceeding node of A in breadth-first. order 
dist:=true; 
While (gq in U) and (dist) 
dist:=distinguishable(p, q, A); 
End While; 
If dist then append p to U 
else A := merge(p, q, A); 
End While; 


Implementing this label copying process correctly requires careful attention to 
details, but the conceptually important thing is that the resulting merger of different 
parts of the training set increases its effective density and constrains succeeding 
choices of which state to merge. This can be good or bad depending on whether 
the algorithm’s greedy initial state merging choices are correct. If they are not, the 
resulting merger of unrelated sets of labels can cause the training set to look random 
and lead to an explosion in the size of the hypothesis. Conversely, if the initial choices 
are correct there can be a snowballing of constraints leading to a highly accurate 
hypothesis. Because the algorithm’s initial choices are so important they should be 
based on as much evidence as possible. 


4.5 EDSM algorithm 


Price won the Abbadingo One Learning Competition by using an evidence-driven 
state merging (EDSM) algorithm. Essentially, he realized that an effective way of 
choosing which pair of nodes to merge next within the APTA would simply involve 
selecting the pair of nodes whose subtrees share the most similar labels. 

A post-competition version of the EDSM algorithm as described by Lang, Pearl- 
mutter and Price [9] is included below. 

The score is calculated by assigning one point for each overlapping label node 
within the subtrees rooted at the nodes considered for merging. If the two nodes 
are distinguishable, the score is —co. No merging is possible when all the remaining 
pairs of nodes are pairwise distinguishable. 

The general idea of the EDSM approach is to avoid bad merges by selecting the 
pair of nodes within the APTA which has the highest score. It is expected that the 
scoring will indicate the correctness of each merge, since on average, a merge that 
survives more label comparisons is more likely to be correct. [9]. 
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Algorithm 5 EDSM Algorithm 
EDSM(APTA(S., 5_)) 
A:=APTA(S4, S_); 
For all pairs (p,q) in Q 
compute score(p, q); 
End For; 
Repeat 
Find p,q in Q such that score(p,q) is maximum and positive; 
A := merge(p, q, A); 


until no merge is possible. 


Unfortunately, the difficulty of detecting bad merge choices increases as the 
density of the training data decreases. Since the number of labeled nodes decreases 
within the APTA as the training data becomes more sparse, the idea of selecting 
merge pairs based on the highest score proves less effective. 

This explains why the EDSM approach did well with large automata but not 
as well with low density problems. Considering every potential merge pair at each 
stage of the inference process is computationally expensive. 


4.6 Windowed EDSM algorithm 


To improve the running time of the EDSM algorithm, one possibility is to merge 
only those nodes that lie within a fixed sized window from the root node of the 
APTA. The recommended size of the window is twice the number of states in the 
target DFA. This might be a problem when the size of the target DFA is not known. 
However, a simple solution is to execute the algorithm several times while gradually 
increasing the window size. Unfortunately, this approach also has a drawback since 
there is no way of knowing when to stop increasing the window size. The Windowed 
EDSM algorithm is described below. 

As expected, the running time of the W-EDSM algorithm is much better than 
that of EDSM. The improvement in the running time is due to the reduction of 
the search space at each merge step of the algorithm. Of course, this can harm the 
performance of the algorithm in the relatively rare case in which high scoring merges 
involving deep nodes may be excluded from the window. For instance, the ideal 
algorithm would consider all possible merge pairs, and select for merging those pairs 
of nodes that score highest. Since such an algorithm is computationally expensive, 
only a subset of possible merge pairs are to be considered. 

We denoted by nm the number of states of the target DFA. Q is the set of states of 
the APTA A and the score is computed in the same way as in the EDSM algorithm. 
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Algorithm 6 W-EDSM Algorithm 
W-EDSM(APTA(S,, S_)) 
A:=APTA(S,, S_); 
W := {A}; 
winsize :=2*N; 
Repeat 
While (size(W) < winsize) and (W # Q) 
find the next node q in breadth-first order; 
add q to W; 
End While; 
max := —1; 
For all p,q in W do 
compute score(p, q); 
If score(p,q) > max then maz := score(p, q); 
Pmax :=P; 
Imax := 7 ; 
End For; 
If (maz > —1) then A := merge(pmaz, Gmaz, A) 
else winsize := 2 * winsize; 


until (W = Q). 


It is conjectured that a tight upper bound on the running time of the W-EDSM 
algorithm is closer to m°n than to m+n where m is the number of nodes in the 
APTA and n is the number of states in the final hypothesis [9]. 


4.7 Blue-fringe algorithm 


An alternative windowing method to that used by the W-EDSM algorithm is also 
described by Lang, Pearlmutter and Price [9]. It uses a red and blue coloring scheme 
to provide a simple but effective way of choosing the pool of merge candidates 
at each merge level in the search. The Blue-fringe windowing method helps the 
implementation of the algorithm and improves on its running time. 

Similar to the W-EDSM algorithm, Blue-fringe EDSM places a restriction on the 
merge order. For example, the algorithm always starts with the root node colored 
red and its children blue resulting in a maximum of two possible merge pairs to 
choose from at the start. 

Considering the sparseness of some of the data sets, one would assume that 
the pool of possible merge pairs would be greatest at the start and then gradually 
decrease to save on the running time. All the evidence in the training data would be 
considered at the start, which helps to make the correct decisions in the initial stage 
of the algorithm. This is important since changing the label of a node after it has 


; TRIANGLE 8 e June 2012 


State Merging Strategies 133 


Algorithm 7 Blue-fringe Algorithm 
Blue-fringe(APTA(S,, S_)) 
A:=APTA(S4, S_); 


Red := {A}; 
Blue := 0; 
Repeat 


For all p in Red do 
For all sons q of the node p do 
If q not in Red then add q to Blue; 
End For; 
End For; 
max := —1; 
For all (p in Red) and all (q in Blue) do 
compute score(p, q); 
If score(p,q) > max then maz := score(p, q); 
Pmaz :=P; 
Imax := 4 ; 
End For; 
If there exist (qin Blue) such that for all p in Red score(p, q) = —00 
then add q to Red; 
remove q from Blue; 
else A := merge(Pmax; Umax, A)} 
until (Blue = @) and (maz = —1). 


been labeled as a result of a merge is not possible within this algorithm. Instead, as 
the algorithm progresses, the number of red and blue nodes increases, resulting in 
a large number of possible merge choices. 

Despite the restriction in the merge order and the reduction in merge choices at 
each merge level within the search tree, Blue-fringe EDSM is very effective and its 
inference capabilities are comparable with those of W-EDSM. 

The score is computed in the same way as in the EDSM algorithm. We should 
add that when the algorithm promotes the blue node which is distinguishable from 
each red node, it chooses the shallowest one. 

The upper bound on the running time of the Blue-fringe algorithm is mn? where 
m is the total number of nodes and n is the total number of states in the initial 
APTA and final hypothesis, respectively [9]. It is important to note that this is of 
an order of magnitude greater than the Traxbar algorithm. 
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4.8 SAGE algorithm 


The inference engine used by Hugues Juillé is vastly different from the algorithms 
discussed thus far. Actually, Juillé and Pollack were the first to use random sampling 
techniques on search trees as a heuristic to control the search. The idea of using a 
tree to visualize the search space is very practical. 

The algorithm is based on a Self-Adaptive Greedy Estimate search procedure 
(SAGE). Each iteration of the search procedure consists of two phases: a construction 
phase and a competition phase. 

It is in the construction phase that the list of alternatives or merge choices is 
determined. All the alternatives in the list have the same depth within the search 
tree. Each member of a population of processing elements is then assigned one 
alternative from the list. Each processing element then scores its assigned alternative 
by randomly choosing a path down the search tree until a leaf node is reached or a 
valid solution is encountered. Next, the competition phase kicks in. 

The scores assigned to each alternative in the search tree are then used to guide 
the search. The meta-level heuristic determines whether to continue with the next 
level of search. If so, each processing element is randomly assigned one of the children 
of its assigned alternative. The search ends when no new node can be expanded upon. 

To avoid an exhaustive search of the problem space only the first set of initial 
merges are explored. These are thought of as the most critical merge choices since 
each merge places constraints on future merges. 


5 Concluding Remarks 


We have revised the Trakhtenbrot and Barzdin (TB), Gold, RPNI and Lang al- 
gorithms. As can be seen in [3], the first two are in fact the same, while the first 
description that Lang provides of the TB algorithm agrees with it only when the 
sample is uniformly complete. The extension he gives to obtain consistent hypothe- 
ses is in fact the RPNI algorithm. The evidence driven state merging technique 
gives better results on large and sparse data sets, mainly because we avoid doing 
“bad mergings", based on the evidence we have. A totally different approach is the 
algorithm introduced by Hugues Juillé, in which random sampling techniques are 
used. 

Our main contribution consists of presenting these algorithms in the same frame- 
work, which makes the comparison between them much easier and offers a solid base 
for those who are in the beginning of their research career in Grammatical Inference 
in general, and state merging strategies for identification in the limit of DFA, in 
particular. 
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Summary. This paper shows that Petri net control mechanisms in grammars can 
be considered to be a unifying approach to different controls and characterizations 
which are used in well-known families of languages such as matrix and vector lan- 
guages. 


1 Introduction 


It is well-known that context-free languages are not sufficient to model phenom- 
ena which are known in natural and programming languages. On the other hand, 
context-sensitive languages are much more powerful and have negative features (e.g. 
among decision problems only membership is decidable and all existing algorithms 
for this problem have exponential complexities). Therefore, the languages of interest 
are those that use context-free production rules and have larger generative capac- 
ities. Many grammars have a context-free base and a mechanism that controls the 
sequences of rules in a derivation [1, 3, 4, 5, 8, 10, 14, 15]. 

A context-free grammar and its derivation process can be described by a Petri 
net, where places correspond to nonterminals and terminals, transitions are the 
counterpart of productions, and tokens reflect the occurrences of symbols in the 
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sentential form, and there is a one-to-one correspondence between the application 
of (sequences of) rules and the firing of (sequences of) transitions. Therefore, control- 
ling the sequences of rules in a derivation by adding some features to the associated 
Petri net is natural and straightforward. 

In [2] it has been shown that by adding some places and arcs which satisfy some 
structural requirements, random context languages can be generated. In this paper 
we show that vector, semi-matric and matrix languages can also be generated by 
grammars controlled by Petri nets. Thus, control by Petri nets can be considered 
as a unifying approach to different types of controls and characterizations used in 
well-known language families. 

The paper is organized as follows. In Section 2 we give some notions and defini- 
tions from the theories of formal languages and the Petri nets needed in the sequel. 
We also introduce the Petri net associated with a context-free grammar. In Section 3 
we construct the Petri net control mechanisms, define the corresponding grammars, 
and give some examples. Section 4 contains the results on the generative powers of 
families of languages generated by Petri net controlled grammars. 


2 Definitions 


The reader is assumed to be familiar with the basic notions of formal language 
theory and Petri net theory contained in [4, 6, 11, 7, 9, 12, 13]. 


2.1 Grammars 


Let ©’ = {a1, a2,..., ax} be an alphabet. A string over ¥’ is a sequence of symbols 
from the alphabet. If w = wiwoaws3 for some wi,we,w3 € ©”, then we is called a 
substring of w. A word w= uiU2::+Un, U1,U2,---,Un € »’' isa scattered substring 
of v € X*, denoted by u < v, if uv = viuiv2+++ UnUn+1 for some U1, V2,..-,;Un+1 © 
with n > 1. We write u & v if u is not a scattered substring of v. The length of a 
string w is denoted by |w|, and the number of occurrences of a symbol a in a string 
w by |wla. The empty string is denoted by which is of length 0. The set of all 
strings over the alphabet »’ is denoted by ©*. A subset L of &™* is called a language. 


For u,v € \”* their shuffle is defined by sh(u,v) = u1viu2ve:::UnUn where 
U = U1U2++:Un and v = V1U2°++Un, Ui,vi € Y*, 1 <i <n. A shuffle of u and 
v is proper if neither sh(u,v) = uv nor sh(u,v) = vu. A proper shuffle is denoted 
by sh* (u,v). The operation shuffle can be generalized for more than two strings by 


sh(u1, W2,..-,Un) = sh(sh(wi,u2,...,tUn—1),Un), n > 3. A shuffle of ui, u2,..., Un, 
n > 2, is semi, denoted by ssh(ui,wu2,...,Un), if ui = uz, for some 1 <i<j< 
n, then sh*(ui,uj) K sh(wi,u2,...,Un). In words, a semi-shufHle string does not 


contain self-shuffled scattered substrings. 
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A context-free grammar is a quadruple G = (V, 2’, S, R) where V and » are the 
disjoint finite sets of nonterminal and terminal symbols, respectively, S € V is the 
start symbol and the finite set RC V x (VU )* is the set of (production) rules. 
Usually, a rule (A, x) is written as A — x. A rule of the form A — is called an 
erasing rule. The word « € (VUS)* directly derives y € (VU)*, written as x > y, 
iff there is arule r = A—a€ Rsuch that « = 71 Are and y = 11022. The reflexive 
and transitive closure of => is denoted by =*. A derivation using the sequence of 
rules 7 = rirg::-Tn is denoted by => or ''2'". The language generated by G is 
defined by L(G) = {we X* | S>* wh. 


A matriz grammar is a quintuple G = (V, ©’, S, M) where V, »’,S are defined 
as for a context-free grammar, M is a finite set of matrices which are finite strings 
over a set of context-free rules. The language generated by the grammar G consists 
of all strings w € ©* such that there is a derivation S mS" wy where T1ra*+* Tn = 
Mj,Mjz°++M;, for some Mj,,Mjo,-..,7™;, € M. 

A vector grammar is a quintuple G = (V, »’, S, M) whose components are 
defined as for a matrix grammar. The language generated by the grammar G consists 
of all strings w € ©* such that there is a derivation S mes" w where r179°++ Tn = 
sh(m ;,,Mjg,---,™My;,) for some m;,,™Mj.,-..,m;, € M. 

A semi-matriz grammar is a quintuple G = (V, ’', S, M) whose components 
are defined as for a matrix grammar. The language generated by the grammar G 


's _ Pp is % TUT! r 
consists of all strings w € * such that there is a derivation S “= >" w where 
rir2+++ Tm = ssh(mj,,™Mjo,...,™;,) for some m;,,Mj.,-..,™m;, € M. 


A matrix (semi-matrix, vector) grammar G is called without repetitions, if each 
rule r occurs in M = {m1,mz,..., mn} only once, i.e., |mime2---mMn|r = 1. 

For each matrix grammar, an equivalent matrix grammar without repetitions 
can be constructed by adding chain rules. 

The families of languages generated by matrix, vector and semi-matrix gram- 
mars (with erasing rules) are denoted by MAT, V and sMAT (MAT*, V* and 
sMAT”), respectively. 


2.2 Petri nets 


A (place/transition) Petri net (PN) is a construct N = (P,T,F,y) where P and T 
are disjoint finite sets of places and transitions, respectively, F C (P x T)U(T x P) 
is the set of directed arcs, p: F — {1,2,...} is a weight function. 


A Petri net can be represented by a bipartite directed graph with the node set 
PUT where places are drawn as circles, transitions as bores and arcs as arrows with 
labels y(p,t) or y(t, p). If p(p,t) = 1 or y(t, p) = 1, the label is omitted. 


A mapping w : P — {0,1,2,...} is called a marking. For each place p € P, 
u(p) gives the number of tokens in p. Graphically, tokens are drawn as small solid 
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dots inside circles. °x = {y: (y,x) € F} and x® = {y: (x,y) € F} are called pre- 
and post-sets of x € PUT, respectively. The elements of *t(*p) are called input 
places (transitions) and the elements of t® (p*) are called output places (transitions) 
of t (p). 


A transition t € T is enabled by marking y iff (pp) > y(p,t) for all p € P. In this 
case t can occur (fire). Its occurrence transforms the marking yu into the marking 
uh’ defined for each place p € P by p'(p) = u(p) — y(p,t) + v(t, p). A finite sequence 


ti, ta,...,t of transitions is called an occurrence sequence enabled at a marking yu 
if there are markings j11, W2,..., Ue Such that uw 4, ba BB os Lk. In short this 


sequence can be written as pu Ss, Le Or LL aa Le Where v = tito---t,. For each 
1<i<hk, the marking p; is called reachable from the initial marking v. 


A sequence of places and transitions p = 21%2--- Zp, is called a path if and only 
if no place or transition except x1 and x, appears more than once, and 2;41 € x? 
for all 1 <i <n-—1. tr(p) denotes the sequence of transitions in a path p. A path 
p=%1%@2°-:-ty, isa cycle ifn = 1 or 41 = Zp and n> 2. A path p= 11%2--- Xp is 
a chain ifn =1 or a1 # % and n > 2. We denote by P,, Tp, F, the sets of places, 
transitions, and arcs of a path p. Pathes pi and p2 are called disjoint if P,,P,. = 90 
and T,, NT, = 0. 


A marked Petri net is a system N = (P,T,F,y,t) where (P,T, F,) is a Petri 
net, J is the initial marking. Let M be a set of markings, which will be called final 
markings. An occurrence sequence v of transitions is called successful if it is enabled 
at the initial marking v and finished at a final marking 7 of M. 


2.3 cf Petri nets 


The construction of the following type of Petri nets is based on the idea of using 
similarity between the firing of a transition and the application of a production rule 
in a derivation in which places are terminal and nonterminal symbols and tokens 
are separate occurrences of symbols. 


Definition 1. Let G = (V, 7, S, R) be a context-free grammar. The cf Petri net with 
respect to the grammar G is a (labeled marked) Petri net N = (P,T,F,¢,¢, 3,7) 
where 


labeling functions 0: P—> (VU) and y:T — R are bijections; 

(p,t) € F iff y(t) =A we R where B(p) = A and y(p,t) = 1; 

(t,p) € F iff yt) =A we R where B(p) = 2, |w|z > 0 and y(t, p) = |wla; 
u(3-*1(8)) =1 and 1(87'(x)) = 0 for all e € (VU S) \ {S}. 


Example 1. Let Gi be a context-free grammar with the rules: S — AB,A — 
aA|bAla|b,B — aB|bBlalb. (the other components of the grammar can be seen 
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S(e 


A-aA B-—aB 


Fig. 1. A cf Petri net N. 


from these rules). Figure 1 illustrates the corresponding cf Petri net N. Obviously, 
L(Gi) = {a,b}. 


The following proposition, which directly follows from the definition, shows the 
similarity between terminal derivations in a context-free grammar and successful 
occurrences in the corresponding cf Petri net. 


Proposition 1. Let N = (P,T,F,,t,G,y) be the cf Petri net with respect to a 
TYTIA Tn 


contest-free grammar G = (V, 7,5, R). Then S =>" w € &”* is a derivation in G 


cp tytg-et : a . 
iff. >” 7 is an occurrence sequence of transitions in N such that y(tite---tn) = 


rir2+*+fn and T(B~*(x)) =0 for alla EV. 


3 Petri Net Controlled Grammars 


In this section we construct Petri net control mechanisms and define Petri net con- 
trolled grammars. 
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3.1 Chain, cyclic and supervised cyclic controls 


We add new places and arcs, called control places and arcs, to a cf Petri net in such 
a way that the new places with input and output transitions make up chains or 
cycles. Let N = (P,T, F,y,t,G,y7) be a cf Petri net with respect to a context-free 
grammar G = (V,2,S,R). Let Py = {p | p = B-'(x),2 € V} and Ps = {p|p= 
B-'(x),x € ©}. We use these notations in all definitions hereinafter. 


Chain control. Let IT = {pi,p2,..-,Pn} be a set of disjoint chains where for 
all pe IT, p = typitepe:-:pr—ite where 5 = {t1, te,..., tk} C T and P, = 
{p1,p2,---,;Pk-1}, Pop = {(ti,pi) | 1 Sic k—- IU L{(pitia1) | 1 <tick—-1} 
are the set of new places and arcs, respectively. Let 


Pr= (JP, To= (JT, Fa= F. 


pell perl pell 


Without loss of generality we assume that T’ = Ty (since each transition of T can 
be considered as a chain or a cycle). 


Definition 2. A chain-cf Petri net is a system N = (P, T,F, @,?, B, y,M) where 
e P=PUPq and F=FUFr, 
the weight function ~ is defined by 


edie tia if (w,y) € F 
, 1 if (x,y) © Fa, 


e the initial marking ¢ is defined by 


(p) = Up) ifpeP 
0 ifpe Pn, 


e the labeling function 8: P+ VUSU {)} is defined by 


Oe Bip) ifpeP 
a if p € Pn, 


e M is a set of final markings where for each tT € M, tT(p) = 0 for all p € P—Py. 


Example 2. Figure 2 illustrates a chain-cf Petri net N which is based on the context- 
free grammar introduced in Example 1. 


According to the construction of a chain-cf Petri net N, the firing of its transi- 
tions follows the rules below: 
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A-—aA BoaB 


Fig. 2. A chain-cf Petri net N. 


e for each chain p = tipitepe--+-tr—1pn—itn © I, the firing of its transitions starts 
with the firing of transition ti, and t; can occur any time (when it is enabled), 
it does not depend on the next transitions in the sequence tr(p); 

e transition t; can occur if t;-1, 1 <7 <n, has occurred, and for each occurrence 
of sequence o of transitions of N, |o|+,_, > |o|t;. Moreover, if o is successful 
then |o|ey = |o|t. Sa |Oltn’ 

e the transitions of disjoint chains can occur concurrently since the places of a 
chain merely control the firing of the transitions of the chain. 


Consequently, 


Proposition 2. For each successful occurrence sequence transitions o = tit2---tn 
of a chain-cf Petri net N, o = sh(tr(p:,),tr(pis),.--,tr(pi,)) for some (not neces- 
sarily different) chains pi,, Pin,..-, Pi, © HI, s> 1. 
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Cyclic control. Let IT = {p1, p2,..-, Pn} be a set of disjoint cycles where for all p € 
I, p = pitipet2-+-petpp1 where T, = {ti,te,..., th} C T and P, = {pi, po,..., pe}, 
Fy = {(pi, ts) | 1 < a < k} U {(ti, pit) | 1 < a < k- 1} U {(tx, p1)} are the set of 
new places and arcs, respectively. Let 


Pr= |) Px Te= Ts Pea Ue: 


pell pell pell 
Definition 3. A cyclic-cf Petri net is a system N = (P,T,F,¢,i,8,y,M) where 


e P=PUPq_ and F=FUFz, 
the weight function ~ is defined by 


sey) I PEY) Faye F 
P(x, y) {' if (0, y) € Fn, 


e the initial marking ¢ is defined by 
Up) ifpeP 


i(p)= 41 ifp=pe€P,,pect 
0 ifpe Po—{pi},p € H, 


e the labeling function 8B: P—+VUSXU {A} is defined by 


B 


) — {90) fre 
r if p € Pn, 


e M is a set of final markings where for each tT € M, 


0 ifpe Py 
T(p)=41 tfp=m EP, pel 
0 ifpe Po — {pi},p€ IL. 


Example 3. Figure 3 illustrates a cyclic-cf Petri net N which is based on the context- 
free grammar given in Example 1. 


According to the construction of a cyclic-cf Petri net N, the firing of its transi- 
tions follows the rules below: 


e for each cycle p = pitipete-+:Pntnpi € I, the firing of its transitions starts with 


the firing of transition t,, and t; can occur next when transitions t2,t3,...,tn 
have occurred in the defined order; 

e for each occurrence of sequence o of transitions of N, |o|t,_, = |ole; or |olz;_, = 
jolt; + 1, and if o is successful then |o|t, = |oltg =--- = |oltn3 
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S(e 
S— AB 
sé. 
A B 
A-—b Boab 
b 
e 
A-a Baa 
a 
e 
A—aA B= aB 


Fig. 3. A cyclic-cf Petri net N. 


e the transitions of disjoint cycles can occur concurrently since the places of a cycle 
merely control the firing of the transitions of the cycle while the occurrence of the 
transitions of the same cycle a second time can start when the first occurrence 
of the transitions finishes. 


Consequently, 


Proposition 3. For each successful occurrence sequence transitions 0 = tit2:--tn 
of a cyclic-cf Petri net N, o = ssh(tr(pi,),tr(pi.),...,tr(pi,)) for some (not neces- 
sarily different) chains pi,,Piz,-.-, Pi, € HI, s> 1. 


Supervised cyclic control. Let IT = {pi, p2,-.-,Pn} be a set of cycles such that 
Po, A Pog +++ Po, = {po} where for all p € I, p = potipite:+-pr—itkpo where 
T, => {t1, te,..., tk} Cc T and Po = {po,p1,---;Pk—1}; Fy = {(pi, tiz1) | 0 < a < 
k-1}U{(ti,pi) | 1 < i< k-—13U {(te,po)} are the set of new places and arcs, 
respectively. Let 
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Po= JP, Ta=UT, Fo=U F. 


pell pell pell 


Definition 4. A system N = (P,T, F, @, i, B,y,M) is called a supervised-cyclic-cf 
Petri net where 


e P=PUPq and F=FUFnm, 


e the weight function ~ is defined by 
i g(t,y) f(a. y)eF 
Bley) =P) Vow 
1 if (a, y) € Fr, 
e the initial marking t is defined by 


Up) ifpeP 
ip)= 91 ifp=po 
0 ifpePo— {po}, pe H, 
e the labeling function 3: P+ VU U {A} is defined by 
Bp) ifpeP 
= 
r ifp € Pn, 
e M is a set of final markings where for each tT € M, 


B 


0 ifpe Py 
T(p)=41 ifp=po 
0 ifpe Po {po}, pe I. 


Example 4. Figure 4 illustrates a supervised-cyclic-cf Petri net N which is based on 
the context-free grammar given in Example 1. 


According to the construction of a supervised-cyclic-cf Petri net N, the firing of 
its transitions follows the rules below: 


e for each cycle p = potipit2-:-Pn,tnpo € I, the firing of its transitions starts with 


the firing of transition t;, and t; can occur next when transitions t2,t3,...,tn 
have occurred in the defined order; 

e for each successful occurrence of sequence o of transitions of N, |o|¢, = |olt. = 
= [elias 


e if the occurrence of the transitions of a cycle p; has started then the occurrence 
of the transitions of a second (not necessarily different) cycle p2 can start when 
all transitions of p; have fired. 


Consequently, 


Proposition 4. For each successful occurrence sequence transitions o = titg:--tn 
of a supervised-cyclic-cf Petri net N, o = tr(pi,)tr(pi.)---tr(pi,) for some (not 
necessarily different) cycles pi,, Pir,---; Pig © H, 8 > 1. 
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A—b Bob 
b 
A+ bA B->bB 
Aa B-a 
a 
A-—aA B= aB 
e 


Fig. 4. A supervised-cyclic-cf Petri net N. 


3.2 Grammars and examples 


In this section we define the grammars controlled by the Petri nets introduced in 
the section above. 


Definition 5. A chain- (respectively, cyclic-, supervised-cyclic-) cf Petri net con- 
trolled grammar (in short a ch- (c-, sc-) PN controlled grammar) is a tuple 
G = (V,2,S8,R,N) (G = (V,2,5,R,N), G = (V,2,S8,R,N)) where V, Z, S, 
R are defined as for a context-free grammar and N (N, N) is a chain- (cyclic, 
supervised-cyclic-) cf Petri net with respect to the contezt-free grammar (V, »’, S, R). 


Definition 6. The language generated by a ch- (c-, sc-) Petri net controlled gram- 
mar G', denoted by L(G), consists of all strings w € &* such that there is a derivation 


TITQ' Tk eu 
S => *we &* and a successful occurrence sequence of transitions v = tita---tr 
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of N (N, N) such that rirg*3:?e. = A(tite---tr) (T1T2° ++ TK => A(tite---tr), 
rir2:++ Te = Y(tite---t,), respectively). 


Example 5. Let G2 = (V, 2, S, R, N) be a ch-Petri net controlled grammar where 
components V, +’,S,R are defined as for the context-free grammar G2 in Example 
1, and N is the chain-cf Petri net in Figure 2. 

After transition t) = 8~1(S — AB) fires, transitions t; = 6-1(A — aA), 
tz = B-1(A— DA), tp = B-1(A — a), and ty = B~1(A — b) are enabled. Transitions 
t, and (or) t3 can occur several times and in any order, and the corresponding control 
places receive as many tokens as the numbers of occurrences of these transitions. 
Then transitions tj = @~!(B —> aB) and t7 = 6-'(B — bB) occur as many 
times as t, and (or) tg occur. To go a final marking, transition tp = 6~'(A — a) 
or t4 = BA — b) occurs, then, respectively, transition tg = B(B — a) or 
tg = B(B — b) occurs. One can see that G2 generates a vector language 


L(G2) = {wew's | « € {a,b}, w € {a,b}*,w’ € Perm(w)}. 


Example 6. We consider the same context-free grammar as that in Example 1 with 
the cyclic-cf Petri net N in Figure 3 (the transitions of N have the same labels as 
those of N), i.e., G3 = (V, 5, S,R, N). 

After transition to occurs transitions t, to, t3,t4 are enabled. The following cases 
may arise: (1) the sequence t,t; or (and) ts3t7 occurs many times (in any order); (2) 
the sequence t;t3 or tgt; occurs, then tst7 or t7ts occurs; (3) cases (1) and (2) repeat 
in any order; (4) to go to a final marking the sequence tats or tatg occurs. It is easy 
to see that G'4 generates a semi-matrix language 


L(G3) = {wi{A, ab, ba}we{A, ab, ba} --- x - w1{A, ab, ba}we{Ad, ab, ba} +--+ x | 
x € {a,b}, wi, we,... € {a, b}*}. 


Example 7. Let Ga consist of context-free components of the grammar G) (Example 
1) and the supervised-cyclic-cf Petri net N in Figure 4 (it can be seen that the 
transitions preserve the labels). 

The execution of N starts with the occurrence of to, and transitions t1, t2, t3, and 
ta are enabled. Next, only one of them can occur, enabling, respectively, ts, te, t7, 
and tg. The sequence tits or t3t7 occurs many times (in any order). To go to a 
final marking one of the sequences tats, tatg occurs. We can see that G3 generates 
a matrix language 


L(Gs) = {ww | w € {a,b} 7}. 


We denote the families of languages generated by ch- {c-, sc-}Petri net con- 
trolled grammars (with erasing rules) by CHPN, CPN, SCPN (CHPN*, CPN’, 
SCPN’), respectively. 
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4 Generative Capacity 


In this section we show that the introduced Petri net controlled grammars simulate 
some well-known regulated grammars. 


Lemma 1. 


(1) VC CHPN and V* C CHPN’, 
(2) sSMAT C CPN and sMAT* C CPN’, 
(3) MAT C SCPN and MAT’ C SCPN”. 


Proof. Let G = (V,2’,S,M) be a vector (semi-matrix, matrix) grammar where 
M = {mi,me2,...,mx}. Without loss of generality we can assume that G is a 
grammar without repetition. Let R be the set of all rules of 1/. We construct a cf 
Petri net N = (P,T,F,,+, 3,7) (with the notions of Def. 1). 


(CHAIN): For each matrix m = rir2--:Tn € M we define a chain 
Pm = tipitepe---tn—1pn—1itn where pi, p2,...,Pn—1 are new places and ¥(t;) = 
ri, 1<i<n. Then y(tr(p)) = m since 74 is a bijection. Let IT = {pm,,Pmo;---, 
Pm, }. As chains of IT are disjoint, we construct a chain-cf Petri net N (with no- 
tions of Def. 2), and define ch-Petri net controlled grammar G= (V, X'S, R, N). 


Let S "23" w e€ E* be a derivation in the vector grammar G. Then 
rirg+++Ts = sh(m{,mg,...,mj) for some m{,m,...,m;, € M. By bijection 
Y: 
tita-+-ts = "(rira:++rs) 
— y~*(sh(m{, ms, ,mi)) 


= sh(y~"(mi), 7 (m3), --- 7 (mi)) 
= sh(ér(p), tr(e2), ..-,tr(p1)): (1) 


Hence, tit2---ts is a successful occurrence sequence in N, and § "123" we x* 
is also derivation in G, i.e., L(G) C L(G). The inverse inclusion can also be 


shown by backtracking (1). 


(CYCLE): For each matrix m = rir2-:-rn € M we define a cycle 
Pm = pitip2:+:pntnp1 where pi,p2,...,Pn are new places and (ti) = ri, 1 < 
i <n. Since ¥ is a bijection, y(tr(p)) = m. Let IT = {pm,,Pms;---+; Pm, }. As H 
consists of disjoint cycles, we construct a cyclic-cf Petri net N (with notions of 
Def. 3) and define c-Petri net controlled grammar G = (V, XS, R, N). 
Let S ""2"° w © X* be a derivation in the semi-matrix grammar G. Then 
rirg+++Ts =ssh(m{,m,...,m{) for some m{,m3,...,m;, € M. By bijection 7, 
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tite::+ts = (rire++Ts) 
= y~*(ssh(mi, m3, ,mz1)) 
= ssh(y~*(m}),y""(mg),.--.77 (mi) 


ssh(tr(p1), tr(p2),.--,tr(p1)). (2) 


It follows that tite---ts is a successful occurrence sequence in N. Therefore, 


L(G) C L(G). The inverse inclusion can be shown in similar way. 


(SUPERVISED CYCLE): For each matrix m = rir2::-Tn € M we define a cycle 
Pm = potipi:::Pn—1tnpo where po,pi,...,Pn—1 are new places and ¥(t:) = ri, 
1<i<n. Since ¥ is a bijection, y(tr(p)) = m. Let IT = {pm,,Pmo;---+;Pmy}- 
As the cycles of IT have the unique common place po, we construct a supervised- 
cyclic-cf Petri net N (with notions of Def. 4) and define sc-Petri net controlled 
grammar G = (V,,S,R, N). 

Let S "23" w © &* be a derivation in the matrix grammar G. Then 
r1r2+*+Ts = mMmy---m}) for some mi,m,...,m; € M. By bijection 74, 


25 
= '(mims-++mi) 

= *(mi)y*(mg) ++ * (mi) 

= tr(p;)tr(p2) --- tr(p}). (3) 


Thus tit2---ts is a successful occurrence sequence in N and L(G) C L(G). The 
inverse inclusion can be shown using similar arguments. 


By analogous considerations, it can be shown that: 


Lemma 2. 


(1) CHPN C V and CHPN* CV’, 
(2) CPN C sMAT and CPN* C sMAT”, 
(3) SCPN C MAT and SCPN* C MAT”. 


The results of Lemmas 1, 2, Theorem 2.1.2 ([4]) and Theorem 7 ([14]) can be 
summarized in the following theorem. 


Theorem 1. The inclusions presented in Figure 5 hold. 
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MAT* = CPN* = CHPN* = SCPN* 


os 


sMAT = CPN 


V =CHPN 


Fig. 5. If two families are connected by a line (an arrow), then the upper family 


includes (includes properly) the lower family. 
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